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METHODS TO ENGINEER MAMMALIAN-TYPE CARBOHYDRATE 

STRUCTURES 

CROSS-REFERENCE TO RELATED APPLICATIONS 

[0001] This application cl aims priority to U. S. provisional application Ser. No. 
5 60/344,169, Dec. 27, 2001, which is incorporated by reference herein in its 
entirety. 

FIELD OF THE INVENTION 

[0002] The present invention generally relates to modifying the glycosylation 
structures of recombinant proteins expressed in fungi or other lower eukaryotes, to 
1 0 more closely resemble the glycosylation of proteins of higher mammals, in 
particular humans. 

BACKGROUND OF THE INVENTION 
[0003] After DNA is transcribed and translated into a protein, further post 

1 5 translational processing involves the attachment of sugar residues, a process known 
as glycosylation. Different organisms produce different glycosylation enzymes 
(glycosyltransferases and glycosidases), and have different substrates (nucleotide 
sugars) available, so that the glycosylation patterns as well as composition of the 
individual oligosaccharides, even of one and the same protein, will be different 

20 depending on the host system in which the particular protein is being expressed 
Bacteria typically do not glycosylate proteins, and if so only in a very unspecific 
manner (Moens, 1997). Lower eukaryotes such as filamentous fungi and yeast add 
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primarily maonose and mannosylphosphate sugars, whereas insect cells such as 
Sf9 cells glycosylate proteins in yet another way. See for example (Bretthauer, 
1999; Martinet, 1998; Weikert, 1999; Malissard, 2000; Jarvis, 1998; and Takeuchi, 
1997). 

5 [0004] Synthesis of a mammalian-type oligosaccharide structure consists of a 
series of reactions in the course of which sugar-residues are added and removed 
while the protein moves along the secretory pathway in the host organism. The 
enzymes which reside along the glycosylation pathway of the host organism or cell 
determine what the resulting glycosylation patterns of secreted proteins. 

1 0 Unfortunately, the resulting glycosylation pattern of proteins expressed in lower 
eukaryotic host cells differs substantially from the glycosylation found in higher 
eukaryotes such as humans and other mammals (Bretthauer, 1999). Moreover, the 
vastly different glycosylation pattern has, in some cases, been shown to increase 
the immunogenicity of these proteins in humans and reduce their half-life 

15 (Takeuchi, 1997). It would be desirable to produce human-like glycoproteins in 
non-human host cells, especially lower eukaryotic cells. 
[0005] The early steps of human glycosylation can be divided into at least two 
different phases: (i) lipid-linked Glc3Man9GlcNAc 2 oligosaccharides are assembled 
by a sequential set of reactions at the membrane of the endoplasmic reticulum (ER) 

20 and (ii) the transfer of this oligosaccharide from the lipid anchor dolichyl 

pyrophosphate onto de novo synthesized protein. The site of the specific transfer is 
defined by an asparagine (Asn) residue in the sequence Asn-Xaa-Ser/Thr (see Fig. 
1), where Xaa can be any amino acid except proline (Gavel, 1990). Further 
processing by glucosidases and mannosidases occurs in the ER before the nascent 

25 glycoprotein is transferred to the early Golgi apparatus, where additional mannose 
residues are removed by Golgi specific alpha (a)-l,2-mannosidases. Processing 
continues as the protein proceeds through the Golgi. In the medial Golgi, a 
number of modifying enzymes, including N-acetylglucosaminyltransferases (GnT 
I, GnT II, GnT m, GnT IV GnT V GnT VT), mannosidase E and 

30 fucosyltransferases, add and remove specific sugar residues (see, e.g., Figs. 2 and 
3). Finally, in the trans-Golgi, galactosyltranferases and sialyltransferases produce 
a glycoprotein structure that is released from the Golgi. It is this structure, 
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characterized by bi-, tri- and tetra-antennary structures, containing galactose, 
fiicose, N-acetylglucosamine and a high degree of terminal sialic acid, lhat gives 
glycoproteins their human characteristics. 

[0006] In nearly all eukaryotes, glycoproteins are derived from the common core 

5 oligosaccharide precursor Glc 3 Man 9 GlcNAc 2 -PP-Dol, where PP-Dol stands for 
doUchol-pyrophosphate (Fig. 1). Within the endoplasmic reticulum, synthesis and 
processing of dolichol pyrophosphate bound ohgosaccharides are identical 
between all known eukaryotes. However, further processing of the core 
oUgosaccharide by yeast, once it has been transferred to a peptide leaving the ER 

10 and entering the Golgi, differs significantly from humans as it moves along the 
secretory pathway and involves the addition of several mannose sugars. 
[0007J In yeast, these steps are catalyzed by Golgi residing 
mannosyltransferases, like Ochlp, Mntlp and Mnnlp, which sequentially add 
mannose sugars to the core oligosaccharide. The resulting structure is undesirable 

1 5 for the production of humanoid proteins and it is thus desirable to reduce or 
eliminate mannosyltransferase activity. Mutants of S. cerevisiae, deficient in 
mannosyltransferase activity (for example ochl or mnn9 mutants) have been 
shown to be non-lethal and display a reduced mannose content in the 
oligosacharide of yeast glycoproteins. Other oligosacharide processing enzymes, 

20 such as mannosylphophate transferase may also have to be eliminated depending 
on the host's particular endogenous glycosylation pattern. 
Lipid-Linked Oligosaccharide Precursors 

[0008] Of particular interest for this invention are the early steps of N- 
glycosylation (Figs. 1 and 2). The study of alg (a^aragme-hnked glycosylation) 
25 mutants defective in the biosynthesis of the Glc3Man 9 GlcNAc 2 -PP-Dol has helped 
to elucidate the initial steps of N-glycosylation. 

[0009] The ALG3 gene of S.cerevisiae has been succesfully cloned and knocked 
out by deletion (Aebi, 1996). ALG3 has been shown to encode the enzyme Dol-P- 
Man:Man 5 GlcNAc 2 -PP-Dol Mannosyltransferase, which is involved in the first 
30 Dol-P-Man dependent mannosylation step from Man5GlcNAc 2 -PP-Dol to 

MancGlcNAcz-PP-Dol at the luminal side of the ER (Sharma, 2001) (Figs 1 and 
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2). S.cerevisiae cells harboring a leaky alg3-l mutation accumulate 
Man 5 GlcNAc 2 -PP-Dol (structure I) (Huffaker, 1983). 



Structure I: Man 5 GlcNAc 2 



5 




a- 1,2- Mannose 
a-l,6-Mannose 
a- 1,3 -Mannose 



p-l,4-Mannose 
P-l,4-GlcNAc 



GlcNAc 



1 0 Man 5 GlcNAc 2 (Structure I) and MangGlcNAc 2 accumulate in total cell 

mannoprotein of an ochl mnnl alg3 mutant(Nakanishi-Shindo, 1993). This 
S.cerevisiae ochl, mnnl, alg3 mutant was shown to be viable, but temperature- 
sensitive, and to lack a-1,6 polymannose outer chains. 

[0010] In another study, secretory proteins expressed in a strain deleted for alg 3 

1 5 (Aalg3 background) were studied for their resistance to Endo-P-N- 

acetylglucosaminidase H (Endo H) (Aebi, 1996). Previous observations have 
indicated that only those oligosaccharides larger than Man 5 GlcNAc 2 are 
susceptible to cleavage by Endo H (Hubbard, 1980). In the alg3-l phenotype, 
some glycoforms were sensitive to Endo H cleavage, confirming its leakiness, 

20 whereas in the Aalg3 mutant all glycoforms appeared to be resistant and of the 
Man 5 -type (Aebi, 1996), suggesting a tight phenotype and transfer of 
Man 5 GlcNAc 2 oligosaccharide structures onto the nascent polypeptide chain. No 
obvious phenotype was connected with the inactivation of the ALG3 gene (Aebi, 
1996). Secreted exogluconase produced in a Saccharomyces cerevisiae alg3 

25 mutant was found to contain between 35-44% underglycosylated and 

unglycosylated forms and only about 50% of the transferred oligosaccharides 
remained resistant to Endo H treatment (Cueva, 1996). Exoglucanase (Exg), an 
enzyme that contains two potential N-glycosylation sites at Asni65 and Asn 3 2 s, was 
analyzed in more detail. For Exg molecules that received two oligosaccharides it 

30 was shown that the first N-glycosylation site (Asni 6s) was enriched in truncated 
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residues, whereas the second (AS11325) was enriched in regular oligosaccharides. 
35-44% of secreted exoglucanase was non- or underglycosylaled and about 73 - 78 
% of all available N-glycosylation sites were occupied with either truncated or 
regular oligosaccharides (Cueva, 1996). 
5 Transfer of Glucosylated Lipid-Linked Oligosaccharides 

[0011] Evidence suggests that, in mammalian cells, only glucosylated lipid- 
linked oligosaccharides are transferred to nascent proteins (Turco, 1977), while in 
yeast alg5, alg6 and dpgl mutants, nonglucosylated oligosaccharideds can be 
transferred (Ballou, 1986; Runge, 1984). In a Saccharomyces cerevisiae alg8 

10 mutant, underglucosylated GlcMan9GlcNAc 2 is transferred (Runge, 1986). 

Verostek and co-workers studied an alg3 t seel 8, glsl mutant and proposed that 
glucosylation of a Man 5 GlcNAc 2 structure (Structure 1, above) is relatively slow in 
comparison to glucosylation of a lipid-linked Mang structure. In addition, the 
transfer of this MansGlcNAc 2 structure to protein appears to be about 5-fold more 

15 efficient than the glucosylation to Glc3Man 5 GlcNAc 2 . The decreased rate of 

Man 5 GlcNAc 2 glucosylation in combination with the comparatively faster rate of 
Mans structure transfer onto nascent protein is believed to be the cause of the 
observed accumulation of nonglucosylated Mans structures in alg3 mutant yeast 
(Verostek-a, 1993; Verostek-b, 1993). 

20 [0012] Studies preceding the above work did not reveal any lipid-linked 
glucosylated oligosaccharides (Orlean, 1990; Huffaker, 1983) allowing the 
conclusion that glucosylated oligosaccharides are transferred at a much higher rate 
than their nonglucosylated counterparts and thus are much harder to isolate. 
Recent work has allowed the creation and study of yeast strains with un- and 

25 hypoglucosylated oligosaccharides and has further confirmed the importance of the 
addition of glucose to the antenna of lipid-linked oligosaccharides for substrate 
recognition by the oligosaccharyltransferase complex (Reiss, 1996; Stagljar, 1994; 
Burda, 1998). The decreased degree of glucosylation of the lipid-linked Mans- 
oligosaccharides in an alg3 mutant negatively impacts the kinetics of the transfer 

30 of lipid-linked oligosaccharides onto nascent protein and is believed to be the 

cause for the strong underglycosylation of secreted proteins in an alg3 knock-out 
strain (Aebi, 1996). 
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[0013] The assembly of the lipid-linked core oligosaccharide ManpGlcNAc2 
occurs, as described above, at the membrane of the endoplasmatic reticulum. The 
additions of three glucose units to the a-l,3-antenna of the lipid-linked 
oligosaccharides are the final reactions in the oligosaccharide assembly. First an 
5 a- 1,3 glucose residue is added followed by another a- 1,3 glucose residue and a 
terminal a- 1,2 glucose residue. Mutants accumulating dolichol-lioked 
Man 9 GlcNAc 2 have been shown to be defective in the ALG6 locus, and Alg6p has 
similarities to Alg8p, the a-l,3-glucosyltransferase catalyzing the addition of the 
second cc-l,3-linked glucose (Reiss, 1996). Cells with a defective ALG8 locus 

10 accumulate dolichol-linked GlciMan 9 GlcNAc 2 (Runge, 1986; Stagljar, 1994). The 
ALG10 locus encodes the oc-1,2 glucosyltransferase responsible for the addition of 
a single terminal glucose to Glc 2 Man 9 GlcNAc2-PP-Dol (Burda, 1998). 
Sequential Processing of N-glycans by Localized Enzyme Activities 
[0014] Sugar transferases and mannosidases line the inner (luminal) surface of 

15 the ER and Golgi apparatus and thereby provide a "catalytic" surface that allows 
for the sequential processing of glycoproteins as they proceed through the ER and 
Golgi network. In fact the multiple compartments of the cis, medial, and trans 
Golgi and the trans-Golgi Network (TGN), provide the different localities in 
which the ordered sequence of glycosylation reactions can take place. As a 

20 glycoprotein proceeds from synthesis in the ER to full maturation in the late Golgi 
or TGN, it is sequentially exposed to different glycosidases, mannosidases and 
glycosyltransferases such that a specific carbohydrate structure may synthesized 
Much work has been dedicated to revealing the exact mechanism by which these 
enzymes are retained and anchored to their respective organelle. The evolving 

25 picture is complex but evidence suggests that, stem region, membrane spanning 
region and cytoplasmic tail individually or in concert direct enzymes to the 
membrane of individual organelles and thereby localize the associated catalytic 
domain to that locus. 

[0015] In some cases these specific interactions were found to function across 
30 species. For example the membrane spanning domain of <x2,6-ST from rats, an 
enzyme known to localize in the trans-Golgi of the animal, was shown to also 
localize a reporter gene (invertase) in the yeast Golgi (Schwientek, 1995). 
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However, the very same membrane spanning domain as part of a full-length a2,6 
ST was retained in the ER and not further transported to me Golgi of yeast 
(Krezdom, 1994). A full length Gal-Tr from humans was not even synthesized in 
yeast, despite demonstrably high transcription levels. On the other hand the 

5 transmembrane region of human the same GalT fused to an invertase reporter was 
able to direct localization to the yeast Golgi, albeit it at low production levels. 
Schwientek and co-workers have shown that fusing 28 amino acids of a yeast 
mannosyltransferase (Mntl), a region containing a cytoplamic tail a 
transmembrane region and eight amino acids of the stem region, to the catalytic 

10 domain of human GalT are sufficient for Golgi localization of an active GalT. 

Other galactosyltransferases appear to rely on interactions with enzymes resident 
in particular organelles since after removal of their transmembrane region they are 
still able to localize properly. To date there exists no reliable way of predicting 
whether a particular heterologously expressed glycosyltransferase or mannosidase 

15 in a lower eukaryote will be (1), sufficiently translated (2), catalytically active or 
(3) located to the proper organelle within the secretory pathway. Since all three of 
these are necessary to effect glycosylation patterns in lower eukaryotes, a 
systematic scheme to achieve the desired catalytic function and proper retention of 
enzymes in the absence of predictive tools, which are currently not available, has 

20 been designed- 
Production of Therapeutic Glycoproteins 

[0016] A significant number of proteins isolated from humans or animals are 
post-translationally modified, with glycosylation being one of the most significant 
modifications. An estimated 70% of all therapeutic proteins are glycosylated and 

25 thus currently rely on a production system (i.e., host cell) that is able to glycosylate 
in a manner similar to humans. To date, most glycoproteins are made in a 
mammalian host system. Several studies have shown that glycosylation plays an 
important role in determining the (1) immunogenicity, (2) pharmacokinetic 
properties, (3) trafficking, and (4) efficacy of therapeutic proteins. It is thus not 

30 surprising that substantial efforts by the pharmaceutical industry have been 

directed at developing processes to obtain glycoproteins that are as "humanoid" or 
"human-like" as possible. This may involve the genetic engineering of such 



7 



WO 03/056914 PCT7US02/41510 

mammalian cells to enhance the degree of sialylation (i.e., t erminal addition of 
sialic acid) of proteins expressed by the cells, which is known to improve 
pharmacokinetic properties of such proteins. Alternatively one may improve the 
degree of sialylation by in vitro addition of such sugars using known 
5 glycosyltransferases and their respective nucleotide sugars (e.g., 2,3 
sialyltransferase and CMP-Sialic acid). 

[0017] Future research may reveal the biological and therapeutic significance of 
specific glycoforms, thereby rendering the ability to produce such specific 
glycoforms desirable. To date, efforts have concentrated on making proteins with 
10 fairly well characterized glycosylation patterns, and expressing a cDNA encoding 
such a protein in one of the following higher eukaryotic protein expression 
systems: 

1. Higher eukaryotes such as Chinese hamster ovary cells (CHO), 
mouse fibroblast cells and mouse myeloma cells (Werner, 1998); 
15 2. Transgenic animals such as goats, sheep, mice and others (Dente, 

1988); (Cole, 1994); (McGarvey, 1995); (Bardor, 1999); 

3. Plants {Arabidopsis thaliana, tobacco etc.) (Staub, 2000); 
(McGarvey, 1995); (Bardor, 1999); 

4. Insect cells (Spodoptera frugiperda Sf9, Sf21, Trichoplusia ni y etc., 
20 in combination with recombinant baculoviruses such as Autographa californica 

multiple nuclear polyhedrosis virus which infects lepidopteran cells (Altmann, 
1999). 

[001 8] While most higher eukaryotes carry out glycosylation reactions that are 
similar to those found in humans, recombinant human proteins expressed in the 

25 above mentioned host systems invariably differ from their "natural" human 

counterpart (Raju, 2000). Extensive development work has thus been directed at 
finding ways to improving the "human character" of proteins made in these 
expression systems. This includes the optimization of fermentation conditions and 
the genetic modification of protein expression hosts by introducing genes encoding 

30 enzymes involved in the formation of human like glycoforms (Werner, 1998); 
(Weikert, 1999); (Andersen, 1994); (Yang, 2000). Inherent problems associated 
with all mammalian expression systems have not been solved. 
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[0019] Fermentation processes based on mammalian cell culture (e.g., CHO, 
murine, or human cells), for example, tend to be very slow (fermentation times in 
excess of one week are not uncommon), often yield low product titers, require 
expensive nutrients and cofactors (e.g., bovine fetal serum), are limited by 

5 programmed cell death (apoptosis), and often do not enable expression of 

particular therapeutically valuable proteins. More importantly, mammalian cells 
are susceptible to viruses that have the potential to be human pathogens and 
stringent quality controls are required to assure product safety. This is of particular 
concern since many such processes require the addition of complex and 

10 temperature sensitive media components that are derived from animals (e.g., 

bovine calf serum), which may carry agents pathogenic to humans such as bovine 
spongiform encephalopathy (BSE) prions or viruses. Moreover, the production of 
therapeutic compounds is preferably carried out in a well-controlled sterile 
environment An »™ma1 farm, no matter how cleanly kept, does not constitute 

1 5 such an environment, thus constituting an additional problem in the use of 
transgenic animals for manufacturing high volume therapeutic proteins. 
[0020] Most, if not all, currently produced therapeutic glycoproteins are therefore 
expressed in mammalian cells and much effort has been directed at improving (i.e., 
'"humanizing") the glycosylation pattern of these recombinant proteins. Changes in 

20 medium composition as well as the co-expression of genes encoding enzymes 
involved in human glycosylation have been successfully employed (see, for 
example, Weikert, 1999). 

[0021] While recombinant proteins similar to their human counterparts can be 
made in mammalian expression systems, it is currently not possible to make 

25 proteins with a human-like glycosylation pattern in lower eukaryotes (fungi and 
yeast). Although the core oHgosaccharide structure transferred to a protein in the 
endoplasmic reticulum is basically identical in mammals and lower eukaryotes, 
substantial differences have been found in the subsequent processing reactions 
which occur in in the Golgj apparatus of fungi and mammals. In fact, even 

30 amongst different lower eukaryotes there exist a great variety of glycosylation 
structures. This has prevented the use of lower eukaryotes as hosts for the 
production of recombinant human glycoproteins despite otherwise notable 
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advantages over mammalian expression systems, such as: (1) generally higher 
product titers, (2) shorter fermentation times, (3) having an alternative for proteins 
that are poorly expressed in mammalian cells, (4) the ability to grow in a 
chemically defined protein free medium and thus not requiring complex animal 
5 derived media components, (5) and the absence of viral, especially retroviral 
infections of such hosts. 

[0022] Various methylotrophic yeasts such as Pichia pastoris, Pichia 
methanolica, and Hansenula polyrnorpha, have played particularly important roles 
as eukaryqtic expression systems because they are able to grow to high cell 
10 densities and secrete large quantities of recombinant protein. However, as noted 
above, lower eukaryotes such as yeast do not glycosylate proteins like higher 
mammals. See for example, Martinet et al (1998) Biotechnol Let. Vol. 20. No. 12, 
which discloses the expression of a heterologous mannosidase in the endoplasmic 
reticulum (ER). 

15 [0023] Chiba et al. (1998) have shown that S.cerevisiae can be engineered to 
provide structures ranging from MangGlcNAc2 to MansGlcNAc 2 structures, by 
eliminating 1,6 mannosyltransferase (OCH1), 1,3 mannosyltransferase (MNN1) 
and a regulator of mannosylphosphatetransferase (MNN4) and by targeting the 
catalytic domain of cc-l,2-mannosidase I from Aspergillus saitoi into the ER of 

20 S.cerevisiae using an ER retrieval sequence (Chiba, 1998). However, this attempt 
resulted in little or no production of the desired Man5GlcNAc 2 , e.g., one that was 
made in vivo and which could function as a substrate for GnTl (the next step in 
making human-like glycan structures). Chiba et al. (1998) showed that P. pastoris 
is not inherently able to produce useful quantities (greater than 5%) of 

25 GlcNAcTransferase I accepting carbohydrate. 

[0024] Maras and co-workers assert that in T. reesei "sufficient concentrations of 
acceptor substrate (i.e. Man 5 GlcNAc 2 ) are present", however when trying to 
convert this acceptor substrate to GlcNAcMan5GlcNAc 2 in vitro less than 2% were 
converted thereby demonstrating the presence of MansGlcNAc2 structures that are 

30 not suitable precursors for complex N-glycan formation (Maras, 1997; Maras, 
1999). To date no enabling disclosure exists, that allows for the production of 
commercially relevant quantities of GlcNAcMan5GlcNAc2 in lower eukaryotes. 
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[0025] It is therefore an object of the present invention to provide a system and 
methods for humanizing glycosylation of recombinant glycoproteins expressed in 
non-human host cells. 

5 SUMMARY OF THE INVENTION 

[0026] The present invention relates to host cells such as fungal strains having 
modified lipid-linked oligosaccharides which may be modified further by 
heterologous expression of a set of glycosyltransferases, sugar transporters and 
mannosidases to become host-strains for the production of mammalian, e.g., 

10 human therapeutic glycoproteins. A protein production method has been 

developed using (1) a lower eukaryotic host such as a unicellular or filamcnious 
fungus, or (2) any non-human eukaryotic organism that has a different 
glycosylation pattern from humans, to modify the glycosylation composition and 
structures of the proteins made in a host organism ("host cell") so that they 

15 resemble more closely carbohydrate structures found in human proteins. The 

process allows one to obtain an engineered host cell which can be used to express 
and target any desirable gene(s) involved in glycosylation by methods that are well 
established in the scientific literature and generally known to the artisan in the field 
of protein expression. As described herein, host cells with modified Upid-linked 

20 ohgosaccharides are created or selected. N-glycans made in the engineered host 
cells have a GlcNAcMan 3 GlcNAc 2 core structure which may then be modified 
further by heterologous expression of one or more enzymes, e.g., glycosyl- 
transferases, sugar transporters and mannosidases, to yield human-like 
glycoproteins. For the production of therapeutic proteins, this method may be 

25 adapted to engineer cell lines in which any desired glycosylation structure may be 
obtained. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0027] Figure 1 is a schematic of the structure of the dolichyl pyrophosphate- 
30 linked oligosaccharide. 
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[0028] Figure 2 is a schematic of the generation of GlcNAc 2 Man 3 GlcNAc 2 N- 
glycans from fungal host cells which are deficient in alg3, alg9 or alg 12 activities. 
[0029] Figure 3 is a schematic of processing reactions required to produce 
mammalian-type oligosaccharide structures in a fungal host cell with an alg3, ochl 
5 genotype. 

[0030] Figure 4 shows S. cerevisiae Alg3 Sequence Comparisons (Blast) 
[0031] Figure 5 shows S. cerevisiae Alg 3 and Alg 3p Sequences 
[0032] Figure 6 shows P.- pastoris Alg 3 and Alg 3p Sequences 
[0033] Figure 7 shows P. pastoris Alg 3 Sequence Comparisons (Blast) 

1 0 [0034] Figure 8 shows K. lactis Alg 3 and Alg 3p Sequences 

[0035] Figure 9 shows JL lactis Alg 3 Sequence Comparisons (Blast) 
[0036] Figure 10 shows S. cerevisiae Alg 9 and Alg 9p Sequences 
[0037] Figure 1 1 shows P. pastoris Alg 9 and Alg 9p Sequences 
[0038] Figure 12 shows P. pastoris Alg 9 Sequence Comparisons (Blast) 

15 [0039] Figure 13 shows S. cerevisiae Alg 12 and Alg 12p Sequences 
[0040] Figure 14 shows P. pastoris Alg 12 and Alg 12p Sequences 
[0041] Figure 15 shows P. pastoris Alg 12 Sequence Comparisons (Blast) 
[0042] Figure 16 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
kringle 3 glycoprotein produced in a P. pastoris showing that the predominant N- 

20 glycan is GlcNAcMan5GlcNAc 2 . 

[0043] Figure 17 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
kringle 3 glycoprotein produced in & P. pastoris (Fig. 16) treated with /?-N- 
hexosaminidase (peak corresponding to MansGlcNAc^) to confirm that the 
predominant N-glycan of Fig. 16 is GlcNAcMan 5 GlcNAc2. 

25 [0044] Figure 18 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
kringle 3 glycoprotein produced in a P pastoris alg3 deletion mutant showing that 
the predominant N-glycans are GlcNAcMan3GlcNAc 2 and GlcNAcMan4GlcNAc 2 . 
[0045] Figure 19 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
kringle 3 glycoprotein produced in &P pastoris alg3 deletion mutant treated with 

30 al,2 mannosidase, showing that the GlcNAcMan4GlcNAc 2 of Fig. 18 is converted 
to GlcNAcMan 3 GlcNAc 2 . 
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[0046] Figure 20 is a MALDI-TOF-MS analysis of N-glycans of Fig. 19 treated 
with /S-N-hexosamiuidase (peak corresponding to Man3GlcNAc2) to confirm that 
the N-glycan of Fig. 19 is GlcNAcMan 3 GlcNAc 2 . 

[0047] Figure 21 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
5 kringle 3 glycoprotein produced in a P. pastoris alg3 deletion mutant treated with 
al,2 mannosidase and GnTE, showing that the GlcNAcMan 3 GlcNAc 2 of Fig. 19 is 
converted to GlcNAc 2 Man 3 GlcNAc2. 

[0048] Figure 22 is a MALDI-TOF-MS analysis of N-glycans of Fig. 21 treated 
with j8-N-hexosaminidase (peak corresponding to Man 3 GlcNAc2) to confirm that 

1 0 the N-glycan of Fig. 2 1 is GlcNAc 2 Man 3 GlcNAc 2 . 

[0049] Figure 23 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
kringle 3 glycoprotein produced in &P pastoris alg3 deletion mutant treated with 
al,2 mannosidase and GnTII in the presence of UDP-galactose and 01,4- 
galactosyltransferase, showing that the GlcNAc 2 Man 3 GlcNAc 2 of Fig. 21 is 

15 converted to Gal 2 GlcNAc 2 Man 3 GlcNAc 2 . 

[0050] Figure 24 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
kringle 3 glycoprotein produced in a P.pastoris alg3 deletion mutant treated with 
al,2 mannosidase and GnTII in the presence of UDP-galactose and 01,4- 
galactosyitransferase, and further treated with CMP-N-acetylneuraminic acid and 

20 sialyltransferase, showing that the Gal 2 GlcNAc 2 Man 3 GlcNAc 2 is converted to 
NANA 2 Gal 2 GlcNAc 2 Man 3 GlcNAc 2 . 

[0051] Figure 25 shows S. cerevisiae Alg6 and Alg 6p Sequences 
[0052] Figure 26 shows P. pastoris Alg6 and Alg 6p Sequences 
[0053] Figure 27 shows P. pastoris Alg 6 Sequence Comparisons (Blast) 

25 [0054] Figure 28shows Klactis Alg6 and Alg 6p Sequences 

[0055] Figure 29 shows KLlactis Alg 6 Sequence Comparisons (Blast) 
[0056] Figure 30 Model of an IgG immunoglobulin. Heavy chain and light 
chain can be, based on similar secondary and tertiary structure, subdivided into 
domains. The two heavy chains (domains Vh, ChI, C h 2 and C H 3) are linked 

30 through three disulfide bridges. The ligjit chains (domains Vl and C L ) are linked by 
another disulfide bridge to the C H 1 portion of the heavy chain and, together with 
the ChI and Vh fragments, make up the Fab region. Antigens bind to the terminal 
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portion of the Fab region. Effector-fimctions, such as Fc-gamma-Receptor binding 
have been localized to the C H 2 domain, just downstream of the hinge region and 
are influenced by N-glycosylation of asparagine 297 in the heavy chain. 
[0057] Figure 31 Schematic overview of a modular IgGl expression vector. 
5 [0058] Figure 32 shows M. musculis GnrZETNucleic Acid And Amino Acid 
Sequences 

[0059] Figure 33 shows K sapiens Gfar/FNucleic Acid And Amino Acid 
Sequences 

[0060] Figure 34 shows M musculis GnT FNucleic Acid And Amino Acid 
10 Sequences 

DETAILED DESCRIPTION OF THE INVENTION 

[0061] Unless otherwise defined herein, scientific and technical terms used in 
connection with the present invention shall have the meanings that are commonly 

15 understood by those of ordinary skill in the art Further, unless otherwise required 
by context, singular terms shall include pluralities and plural terms shall include 
the singular. The methods and techniques of the present invention are generally 
performed according to conventional methods well known in the art Generally, 
nomenclatures used in connection with, and techniques of biochemistry, 

20 enzymology, molecular and cellular biology, microbiology, genetics and protein 
and nucleic acid chemistry and hybridization described herein are those well 
known and co mmo nly used in the art. The methods and techniques of the present 
invention are generally performed according to conventional methods well known 
in the art and as described in various general and more specific references that are 

25 cited and discussed throughout the present specification unless otherwise indicated. 
See, e.g., Sambrook et al. Molecular Cloning: A Laboratory Manual, 2d ed., Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989); Ausubel et al., 
Current Protocols in Molecular Biology, Greene Publishing Associates (1992, and 
Supplements to 2002); Harlow and Lane Antibodies: A Laboratory Manual Cold 

30 Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1990); Introduction to 
Glycobiology, Maureen E. Taylor, Kurt Drickamer, Oxford Univ. Press (2003); 
Worthington Enzyme Manual, Worthington Biochemical Corp. Freehold, NJ; 
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Handbook of Biochemistry: Section A Proteins Vol 1 1976 CRC Press; Handbook 
of Biochemistry: Section A Proteins Vol E 1976 CRC Press; Essentials of 
Glycobiology, Cold Spring Harbor Laboratory Press (1999). The nomenclatures 
used in connection with, and the laboratory procedures and techniques of, 
5 biochemistry and molecular biology described herein are those well known and 
commonly used in the art 

[0062] All publications, patents and other references mentioned herein are 
incorporated by reference. 

[0063] The following terms, unless otherwise indicated, shall be understood to 
1 0 have the following meanings: 

[0064] As used herein, the term "N-glycan" refers to an N-linked 
oligosaccharide, e.g., one that is attached by an asparagme-N-acetylglucosamine 
linkage to an asparagine residue of a polypeptide. N-glycans have a common 
pentasaccharide core of Man 3 GlcNAc 2 ("Man" refers to mannose; "Glc" refers to 
15 glucose; and "NAc" refers to N-acetyl; GlcNAc refers to N-acetylglucosamine). 
N-glycans differ with respect to the number of branches (antennae) comprising 
peripheral sugars (e.g., fucose and sialic acid) that are added to the Man 3 GlcNAc 2 
("Man3") core structure. N-glycans are classified according to their branched 
constituents (e.g., high mannose, complex or hybrid). A "high mannose" type N- 
20 glycan has five or more mannose residues. A "complex" type N-glycan typically 
has at least one GlcNAc attached to the 1,3 mannose arm and at least one GlcNAc 
attached to the 1,6 mannose arm of a "trimannose" core. The ''trimannose core" is 
the pentasaccharide core having a Man3 structure. Complex N-glycans may also 
have galactose ("Gal") residues that are optionally modified with sialic acid or 
25 derivatives ("NeuAc", where "Neu" refers to neuranrinic acid and "Ac" refers to 
acetyl). Complex N-glycans may also have intrachain substitutions comprising 
"bisecting" GlcNAc and core fucose ("Fuc"). A "hybrid" N-glycan has at least 
one GlcNAc on the terminal of the 1,3 mannose arm of the trimannose core and 
zero or more mannoses on the 1,6 mannose arm of the trimannose core. 
30 [0065] Abbreviations used herein are of common usage in the art, see, e.g., 

abbreviations of sugars, above. Other common abbreviations include 'TNGase", 
which refers to peptide N-glycosidase F (EC 3.2.2.18); "GlcNAc Tr (I - IU)", 
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which refers to one of three N-acetylglucosaminyltransferase enzymes; "NANA" 
refers to N-acetylneuraminic acid. 

[0066] As used herein, the term "secretion pathway" refers to the assembly line 
of various glycosylation enzymes to which a lipid-linked oligosaccharide precursor 
5 and an N-glycan substrate are sequentially exposed, following the molecular flow 
of a nascent polypeptide chain from the cytoplasm to the endoplasmic reticulum 
(ER) and the compartments of the Golgi apparatus. Enzymes are said to be 
localized along this pathway. An enzyme X that acts on a lipid-linked glycan or an 
N-glycan before enzyme Y is said to be or to act "upstream" to enzyme Y; 
10 similarly, enzyme Y is or acts "downstream" from enzyme X. 

[0067] As used herein, die term "alg X activity" refers to the enzymatic activity 
encoded by the "alg X" gene, and to an enzyme having that enzymatic activity 
encoded by a homologous gene or gene product (see below) or by an unrelated 
gene or gene product 

1 5 [0068] As used herein, the term "antibody" refers to a full antibody (consisting 
of two heavy chains and two light chains) or a fragment thereof. Such fragments 
include, but are not limited to, those produced by digestion with various proteases, 
those produced by chemical cleavage and/or chemical dissociation, and those 
produced recombinantly, so long as the fragment remains capable of specific 

20 binding to an antigen. Among these fragments are Fab, Fab', F(ab , )2, and single 
chain Fv (scFv) fragments. Within the scope of the term "antibody" are also 
antibodies that have been modified in sequence, but remain capable of specific 
binding to an antigen. Example of modified antibodies are interspecies chimeric 
and humanized antibodies; antibody fusions; and heteromeric antibody complexes, 

25 such as diabodies (bispecific antibodies), single-chain diabodies, and intrabodies 
(see, e.g., Marasco (ed.), Intracellular Antibodies: Research and Disease 
Applications, Springer-Verlag New York, Inc. (1998) (ISBN: 3540641513), the 
disclosure of which is incorporated herein by reference in its entirety). 
[0069] As used herein, the term ''mutation" refers to any change in the nucleic 

30 acid or amino acid sequence of a gene product, e.g., of a glycosylation-related 
enzyme. - 



WO 03/056914 PCT/US02/41510 



[0070] The term "polynucleotide" or "nucleic acid molecule" refers to a 
polymeric form of nucleotides of at least 10 bases in length. The term includes 
DNA molecules (e.g., cDNA or genomic or synthetic DNA) and RNA molecules 
(e.g., mRNA or synthetic RNA), as well as analogs of DNA or RNA containing 

5 non-natural nucleotide analogs, non-native intemucleoside bonds, or both- The 
nucleic acid can be in any topological conformation. For instance, the nucleic acid 
can be single-stranded, double-stranded, triple-stranded, quadruplexed, partially 
double-stranded, branched, hairpinned, circular, or in a padlocked conformation. 
The term includes single and double stranded forms of DNA. 

10 [0071] Unless otherwise indicated, a "nucleic acid comprising SEQ ED NO.X" 
refers to a nucleic acid, at least a portion of which has either (i) the sequence of 
SEQ ID NO:X, or (ii) a sequence complementary to SEQ ID NO:X. The choice 
between the two is dictated by the context For instance, if the nucleic acid is used 
as a probe, the choice between the two is dictated by the requirement that the probe 

15 be complementary to the desired target. 

[0072] An "isolated" or "substantially pure" nucleic acid or polynucleotide (e.g., 
an RNA, DNA or a mixed polymer) is one which is substantially separated from 
other cellular components that naturally accompany the native polynucleotide in its 
natural host cell, e.g., ribosomes, polymerases, and genomic sequences with which 

20 it is naturally associated. The term embraces a nucleic acid or polynucleotide that 
(1) has been removed from its naturally occurring environment, (2) is not 
associated with all or a portion of a polynucleotide in which the "isolated 
polynucleotide" is found in nature, (3) is operatively linked to a polynucleotide 
which it is not linked to in nature, or (4) does not occur in nature. The term 

25 "isolated" or "substantially pure" also can be used in reference to recombinant or 
cloned DNA isolates, chemically synthesized polynucleotide analogs, or 
polynucleotide analogs that are biologically synthesized by heterologous systems. 
[0073] However, "isolated" does not necessarily require that the nucleic acid or 
polynucleotide so described has itself been physically removed from its native 

30 environment For instance, an endogenous nucleic acid sequence in the genome of 
an organism is deemed "isolated" herein if a heterologous sequence (i.e., a 
sequence that is not naturally adjacent to this endogenous nucleic acid sequence) is 
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placed adjacent to the endogenous nucleic acid sequence, such that the expression 
of this endogenous nucleic acid sequence is altered By way of example, a non- 
native promoter sequence can be substituted (e.g., by homologous recombination) 
for the native promoter of a gene in the genome of a human cell, such that this 
5 gene has an altered expression pattern. This gene would now become "isolated" 
because it is separated from at least some of the sequences that naturally flank it. 
[0074] A nucleic acid is also considered "isolated" if it contains any 
modifications that do not naturally occur to the corresponding nucleic acid in a 
genome. For instance, an endogenous coding sequence is considered "isolated" if 

10 it contains an insertion, deletion or a point mutation introduced artificially, e.g., by 
human intervention. An "isolated nucleic acid" also includes a nucleic acid 
integrated into a host cell chromosome at a heterologous site, a nucleic acid 
construct present as an episome. Moreover, an "isolated nucleic acid" can be 
substantially free of other cellular material, or substantially free of culture medium 

15 when produced by recombinant techniques, or substantially free of chemical 
precursors or other chemicals when chemically synthesized. 
[0075] As used herein, the phrase "degenerate variant" of a reference nucleic 
acid sequence encompasses nucleic acid sequences that can be translated, 
according to the standard genetic code, to provide an amino acid sequence identical 

20 to that translated from the reference nucleic acid sequence. 

[0076] The term "percent sequence identity" or "identical" in the context of 
nucleic acid sequences refers to the residues in the two sequences which are the 
same when aligned for ma ximum correspondence. The length of sequence identity 
comparison may be over a stretch of at least about nine nucleotides, usually at least 

25 about 20 nucleotides, more usually at least about 24 nucleotides, typically at least 
about 28 nucleotides, more typically at least about 32 nucleotides, and preferably 
at least about 36 or more nucleotides. There are a number of different algorithms 
known in the art which can be used to measure nucleotide sequence identity. For 
instance, polynucleotide sequences can be compared using FASTA, Gap or Bestfit, 

30 which are programs in Wisconsin Package Version 10.0, Genetics Computer 
Group (GCG), Madison, Wisconsin. FASTA provides alignments and percent 
sequence identity of the regions of the best overlap between the query and search 
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sequences (Pearson, 1990, (herein incorporated by reference). For instance, 
percent sequence identity between nucleic acid sequences can be determined using 
FASTA with its default parameters (a word size of 6 and the NOPAM factor for 
the scoring matrix) or using Gap with its default parameters as provided in GCG 

5 Version 6.1, herein incorporated by reference. 

[0077] The term "substantial homology" or "substantial similarity," when 
referring to a nucleic acid or fragment thereof, indicates that, when optimally 
aligned with appropriate nucleotide insertions or deletions with another nucleic 
acid (or its complementary strand), there is nucleotide sequence identity in at least 

10 about 50%, more preferably 60% of the nucleotide bases, usually at least about 
70%, more usually at least about 80%, preferably at least about 90%, and more 
preferably at least about 95%, 96%, 97%, 98% or 99% of the nucleotide bases, as 
measured by any well-known algorithm of sequence identity, such as FASTA 
BLAST or Gap, as discussed above. 

15 [0078] Alternatively, substantial homology or similarity exists when a nucleic 
acid or fragment thereof hybridizes to another nucleic acid, to a strand of another 
nucleic acid, or to the complementary strand thereof, under stringent hybridization 
conditions. "Stringent hybridization conditions" and "stringent wash conditions" 
in the context of nucleic acid hybridization experiments depend upon a number of 

20 different physical parameters. Nucleic acid hybridization will be affected by such 
conditions as salt concentration, temperature, solvents, the base composition of the 
hybridizing species, length of the complementary regions, and the number of 
nucleotide base mismatches between the hybridizing nucleic acids, as will be 
readily appreciated by those skilled in the art. One having ordinary skill in the art 

25 knows how to vary these parameters to achieve a particular stringency of 
hybridization. 

[0079] In general, "stringent hybridization" is performed at about 25°C below the 
thermal melting point fT™) for the specific DNA hybrid under a particular set of 
conditions. "Stringent washing" is performed at temperatures about 5°C lower 
30 than the T m for the specific DNA hybrid under a particular set of conditions. The 
T m is the temperature at which 50% of the target sequence hybridizes to a perfectly 
matched probe. See Sambrook et al., supra, page 9.51, hereby incorporated by 
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reference. For purposes herein, "high stringency conditions" are defined for 
solution phase hybridization as aqueous hybridization (i.e., free of formamide) in 
6X SSC (where 20X SSC contains 3.0 M NaCl and 0.3 M sodium citrate), 1% SDS 
at 65oC for 8-12 hours, followed by two washes in 0.2X SSC, 0. 1% SDS at 65oC 
5 for 20 minutes. It will be appreciated by the skilled worker that hybridization at 
65°C will occur at different rates depending on a number of factors including the 
length and percent identity of the sequences which are hybridizing. 
[0080] The nucleic acids (also referred to as polynucleotides) of this invention 
may include both sense and antisense strands of RNA, cDNA, genomic DNA, and 

1 0 synthetic forms and mixed polymers of the above. They may be modified 

chemically or biochemically or may contain non-natural or derivatized nucleotide 
bases, as will be readily appreciated by those of skill in the art Such modifications 
include, for example, labels, methylafion, substitution of one or more of the 
naturally occurring nucleotides with an analog, internucleotide modifications such 

15 as uncharged link ages (e.g., methyl phosphonates, phosphotriesters, 

phosphoramidates, carbamates, etc.), charged linkages (e.g., phosphorothioates, 
phosphorodithioates, etc.), pendent moieties (e.g., polypeptides), intercalators (e.g., 
acridine, psoralen, etc.), chelators, alkylators, and modified linkages (e.g., alpha 
anomeric nucleic acids, etc.) Also included are synthetic molecules that mimic 

20 polynucleotides in their ability to bind to a designated sequence via hydrogen 

bonding and other chemical interactions. Such molecules are known in the art and 
include, for example, those in which peptide linkages substitute for phosphate 
linkages in the backbone of the molecule. 

[0081] The term "mutated" when applied to nucleic acid sequences means that 
25 nucleotides in a nucleic acid sequence may be inserted, deleted or changed 

compared to a reference nucleic acid sequence. A single alteration may be made at 
a locus (a point mutation) or multiple nucleotides may be inserted, deleted or 
changed at a single locus. In addition, one or more alterations may be made at any 
number of loci within a nucleic acid sequence. A nucleic acid sequence may be 
30 mutated by any method known in the art including but not limited to mutagenesis 
techniques such as "error-prone PCR" (a process for performing PCR under 
conditions where the copying fidelity of the DNA polymerase is low, such that a 
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high rate of point mutations is obtained along the entire length of the PCR product. 
See, e.g., Leung, D. W., et aL, Technique, 1, pp. 11-15 (1989) and Caldwell, R C. 
& Joyce G. F., PCR Methods Applic., 2, pp. 28-33 (1992)); and "oligonucleotide- 
directed mutagenesis" (a process which enables the generation of site-specific 

5 mutations in any cloned DNA segment of interest See, e.g., Reidhaar-Olson, J. F. 
& Sauer, R. T., et aL, Science, 241, pp. 53-57 (1988)). 
[0082] The term "vector" as used herein is intended to refer to a nucleic acid 
molecule capable of transporting another nucleic acid to which it has been linked. 
One type of vector is a "plasmid", which refers to a circular double stranded DNA 

10 loop into which additional DNA segments may be ligated. Other vectors include 
cosmids, bacterial artificial chromosomes (BAC) and yeast artificial chromosomes 
(YAC). Another type of vector is a viral vector, wherein additional DNA segments 
may be ligated into the viral genome (discussed in more detail below). Certain 
vectors are capable of autonomous rephcation in a host cell into which they are 

15 introduced (e.g., vectors having an origin of replication which functions in the host 
cell). Other vectors can be integrated into the genome of a host cell upon 
introduction into the host ceU, and are thereby replicated along with the host 
genome. Moreover, certain preferred vectors are capable of directing the 
expression of genes to which they are operatively linked. Such vectors are referred 

20 to herein as "recombinant expression vectors" (or simply, "expression vectors"). 
[0083] "Operatively linked" expression control sequences refers to a linkage in 
which the expression control sequence is contiguous with the gene of interest to 
control the gene of interest, as well as expression control sequences mat act in 
trans or at a distance to control the gene of interest 

25 [0084] The term "expression control sequence" as used herein refers to 

polynucleotide sequences which are necessary to affect the expression of coding 
sequences to which they are operatively linked. Expression control sequences are 
sequences which control the transcription, post-transcriptional events and 
translation of nucleic acid sequences. Expression control sequences include 

30 appropriate transcription initiation, termination, promoter and enhancer sequences; 
efficient RNA processing signals such as splicing and polyadenylation signals; 
sequences that stabilize cytoplasmic mRNA; sequences that enhance translation 
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efficiency (e.g., ribosome binding sites); sequences that enhance protein stability; 
and when desired, sequences that enhance protein secretion. The nature of such 
control sequences differs depending upon the host organism; in prokaryotes, such 
control sequences generally include promoter, ribosomal binding site, and 
5 transcription termination sequence. The term "control sequences" is intended to 
include, at a minimum, all components whose presence is essential for expression, 
and can also include additional components whose presence is advantageous, for 
example, leader sequences and fusion partner sequences. 

[0085] The term "recombinant host cell" (or simply "host cell"), as used herein, 
10 is intended to refer to a cell into which a recombinant vector has been introduced. 
It should be understood that such terms are intended to refer not only to the 
particular subject cell but to the progeny of such a cell. Because certain 
modifications may occur in succeeding generations due to either mutation or 
environmental influences, such progeny may not, in fact, be identical to the parent 
1 5 cell, but are still included within the scope of the term "host cell" as used herein. A 
recombinant host cell may be an isolated cell or cell line grown in culture or may 
be a cell which resides in a living tissue or organism. 

[0086] The term "peptide" as used herein refers to a short polypeptide, e.g., one 
that is typically less than about 50 amino acids long and more typically less than 

20 about 30 amino acids long. The term as used herein encompasses analogs and 
mimetics that mimic structural and thus biological function. 
[0087] The term "polypeptide" encompasses both naturally-occurring and non- 
naturally-o ccuxring proteins, and fragments, mutants, derivatives and analogs 
thereof. A polypeptide may be monomelic or polymeric. Further, a polypeptide 

25 may comprise a number of different domains each of which has one or more 
distinct activities. 

[0088] The term "isolated protein" or "isolated polypeptide" is a protein or 
polypeptide that by virtue of its origin or source of derivation (1) is not associated 
with naturally associated components that accompany it in its native state, (2) 
30 when it exists in a purity not found in nature, where purity can be adjudged with 

respect to the presence of other cellular material (e.g., is free of other proteins from 
the same species) (3) is expressed by a cell from a different species, or (4) does not 
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occur in nature (e.g., it is a fragment of a polypeptide found in nature or it includes 
amino acid analogs or derivatives not found in nature or linkages other than 
standard peptide bonds). Thus, a polypeptide that is chemically synthesized or 
synthesized in a cellular system different from the cell from which it naturally 

5 originates will be "isolated" from its naturally associated components. A 
polypeptide or protein may also be rendered substantially free of naturally 
associated components by isolation, using protein purification techniques well 
known in the art. As thus defined, "isolated" does not necessarily require that the 
protein, polypeptide, peptide or oligopeptide so described has been physically 

1 0 removed from its native environment. 

[0089] The term "polypeptide fragment" as used herein refers to a polypeptide 
that has an aniino-terminal and/or carboxy-terminal deletion compared to a full- 
length polypeptide. In a preferred embodiment, the polypeptide fragment is a 
contiguous sequence in which the amino acid sequence of the fragment is identical 

15 to the corresponding positions in the naturaUy^cxurring sequence. Fragments 

typically are at least 5, 6, 7, 8, 9 or 10 amino acids long, preferably at least 12, 14, 
16 or 18 amino acids long, more preferably at least 20 amino acids long, more 
preferably at least 25, 30, 35, 40 or 45, amino acids, even more preferably at least 
50 or 60 amino acids long, and even more preferably at least 70 amino acids long. 

20 [0090] A "modified derivative" refers to polypeptides or fragments thereof that 
are substantially homologous in primary structural sequence but which include, 
e.g., i?i vivo or in vitro chemical and biochemical modifications or which 
incorporate amino acids that are not found in the native polypeptide. Such 
modifications include, for example, acetylation, carboxylation, phosphorylation, 

25 glycosylation, ubiquitination, labeling, e.g., with radionuclides, and various 

enzymatic modifications, as will be readily appreciated by those well skilled in the 
art A variety of methods for labeling polypeptides and of substituents or labels 
useful for such purposes are well known in the art, and include radioactive isotopes 
such as 125 1, 32 P, 35 S, and 3 H, ligands which bind to labeled antiligands (e.g., 

30 antibodies), fluorophores, chenmuminescent agents, enzymes, and antiligands 

which can serve as specific binding pair members for a labeled ligand. The choice 
of label depends on the sensitivity required, ease of conjugation with the primer, 



23 



WO 03/056914 PCT/US02/41510 

stability requirements, and available instrumentation. Methods for labeling 
polypeptides are well known in the ait See Ausubel et aL, 1992, hereby 
incorporated by reference. 

[0091] The term "fusion protein" refers to a polypeptide comprising a 
5 polypeptide or fragment coupled to heterologous amino acid sequences. Fusion 
proteins are useful because they can be constructed to contain two or more desired 
functional elements from two or more different proteins. A fusion protein 
comprises at least 10 contiguous amino acids from a polypeptide of interest, more 
preferably at least 20 or 30 amino acids, even more preferably at least 40, 50 or 60 

10 amino acids, yet more preferably at least 75, 100 or 125 amino acids. Fusion 

proteins can be produced recombinantly by constructing a nucleic acid sequence 
which encodes the polypeptide or a fragment thereof in frame with a nucleic acid 
sequence encoding a different protein or peptide and then expressing the fusion 
protein. Alternatively, a fusion protein can be produced chemically by 

1 5 crosslinking the polypeptide or a fragment thereof to another protein. 

[0092] The term "non-peptide analog" refers to a compound with properties that 
are analogous to those of a reference polypeptide. A non-peptide compound may 
also be termed a "peptide mimetic" or a "peptidomimetic" See, e.g., Jones, (1992) 
Amino Acid and Peptide Synthesis, Oxford University Press; Jung, (1997) 

20 Combinatorial Peptide and Nonpeptide Libraries: A Handbook John Wiley; 
Bodanszky et al., (1993) Peptide Chemistry-A Practical Textbook, Springer 
Verlag; "Synthetic Peptides: A Users Guide", G. A Grant, Ed, W. H. Freeman and 
Co., 1992; Evans et al. J. Med, Chem. 30:1229 (1987); Fauchere, 1 Adv. Drug Res. 
15:29 (1986); Veber and Freidinger EDV5 r p.392 (1985); and references sited in 

25 each of the above, which are incorporated herein by reference. Such compounds 
are often developed with the aid of computerized molecular modeling. Peptide 
mimetics that are structurally s imil ar to useful peptides of the invention may be 
used to produce an equivalent effect and are therefore envisioned to be part of the 
invention. 

30 10093] A "polypeptide mutanf ' or "mutein" refers to a polypeptide whose 

sequence cont ains an insertion, duplication, deletion, rearrangement or substitution 
of one or more amino acids compared to the amino acid sequence of a native or 
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wild type protein- A mutein may have one or more amino acid point substitutions, 
in which, a single amino acid at a position has been changed to another amino acid, 
one or more insertions and/or deletions, in which one or more amino acids are 
inserted or deleted, respectively, in the sequence of the natiiraUy-occurruig protein, 

5 and/or truncations of the amino acid sequence at either or both the amino or 
carboxy termini. A mutein may have the same but preferably has a different 
biological activity compared to the naturally-occurring protein. For instance, a 
mutein may have an increased or decreased neuron or NgR binding activity. In a 
preferred embodiment of the present invention, a MAG derivative that is a mutein 

1 0 (e.g., in MAG Ig-like domain 5) has decreased neuronal growth inhibitory activity 
compared to endogenous or soluble wild-type MAG. 

[0094] A mutein has at least 70% overall sequence homology to its wild-type 
counterpart Even more preferred are muteins having 80%, 85% or 90% overall 
sequence homology to the wild-type protein. In an even more preferred 
1 5 embodiment, a mutein exhibits 95% sequence identity, even more preferably 97%, 
even more preferably 98% and even more preferably 99% overall sequence 
identity. Sequence homology may be measured by any common sequence analysis 
algorithm, such as Gap or Bestfit 

[0095] Preferred amino acid substitutions are those which: (1) reduce 
20 susceptibility to proteolysis, (2) reduce susceptibility to oxidation, (3) alter binding 
affinity for forming protein complexes, (4) alter binding affinity or enzymatic 
activity, and (5) confer or modify other physicochemical or functional properties of 
such analogs. 

[0096] As used herein, the twenty conventional amino acids and their 
25 abbreviations follow conventional usage. See Immunology - A Synthesis (2 nd 

Edition, E.S. Golub and D.R. Gren, Eds., Sinauer Associates, Sunderland, Mass. 
(1991)), which is incorporated herein by reference. Stereoisomers (e.g., D-amino 
acids) of the twenty conventional amino acids, unnatural amino acids such as oe-, 
a-disubstituted amino acids, N-alkyl amino acids, and other unconventional amino 
30 acids may also be suitable components for polypeptides of the present invention. 
Examples of unconventional amino acids include: 4-hydroxyproline, 
-y^carboxyglutamate, €-N,N,N-trimethyUysine, e-N-acetyUysine, O-phosphoserine, 
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N-acetylserine, N-formylmethionine, 3-methylhistidine, 5-hydroxylysine, 
s-N-meihylarginine, and other similar amino acids and imino acids (e.g., 
4-hydroxyproline). In the polypeptide notation used herein, the left-hand direction 
is the amino terminal direction and the right hand direction is the carboxy-terminal 

5 direction, in accordance with standard usage and convention. 

[0097] A protein has "homology" or is homologous" to a second protein if the 
nucleic acid sequence that encodes the protein has a similar sequence to the nucleic 
acid sequence that encodes the second protein. Alternatively, a protein has 
homology to a second protein if the two proteins have "similar" amino acid 

10 sequences. (Thus, the term homologous proteins" is defined to mean that the two 
proteins have similar amino acid sequences). In a preferred embodiment, a 
homologous protein is one that exhibits 60% sequence homology to the wild type 
protein, more preferred is 70% sequence homology. Even more preferred are 
homologous proteins that exhibit 80%, 85% or 90% sequence homology to the 

1 5 wild type protein. In a yet more preferred embodiment, a homologous protein 
exhibits 95%, 97%, 98% or 99% sequence identity. As used herein, homology 
between two regions of amino acid sequence (especially with respect to predicted 
structural similarities) is interpreted as implying similarity in function. 
[0098] When homologous" is used in reference to proteins or peptides, it is 

20 recognized that residue positions that are not identical often differ by conservative 
amin o acid substitutions. A "conservative amino acid substitution" is one in which 
an amino acid residue is substituted by another amino acid residue having a side 
chain (R group) with similar chemical properties (e.g., charge or hydrophobicity). 
In general, a conservative amino acid substitution will not substantially change the 

25 functional properties of a protein. In cases where two or more amino acid 
sequences differ from each other by conservative substitutions, the percent 
sequence identity or degree of homology may be adjusted upwards to correct for 
the conservative nature of the substitution. Means for making this adjustment are 
well kaown to those of skill in the art (see, e.g., Pearson et al., 1994, herein 

30 incorporated by reference). 

[0099] The following six groups each contain amino acids that are conservative 
substitutions for one another: 1) Serine (S), Threonine (T); 2) Aspartic Acid (D), 
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Glutamic Acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine 
(K); 5) Isoleucine (I), Leucine (L), Methionine (M), Alanine (A), Valine (V), and 
6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W). 

[0100] Sequence homology for polypeptides, which is also referred to as percent 
5 sequence identity, is typically measured using sequence analysis software. See, 
e.g., the Sequence Analysis Software Package of the Genetics Computer Group 
(GCG), University of Wisconsin Biotechnology Center, 910 University Avenue, 
Madison, Wisconsin 53705. Protein analysis software matches similar sequences 
using measure of homology assigned to various substitutions, deletions and other 
10 modifications, including conservative amino acid substitutions. For instance, GCG 
contains programs such as "Gap" and "Bestfif ' which can be used with default 
parameters to determine sequence homology or sequence identity between closely 
related polypeptides, such as homologous polypeptides from different species of 
organisms or between a wild type protein and a mutein thereof See, e.g., GCG 
15 Version 6.1. 

[0101] A preferred algorithm when comparing a inhibitory molecule sequence to 
a database containing a large number of sequences from different organisms is the 
computer program BLAST (Altschul, S JF. et al. (1990) J. Mol. Biol. 215:403-410; 
Gish and States (1993) Nature Genet. 3:266-272; Madden, T.L. et al. (1996) Meth. 
20 Enzymol. 266:131-141; Altschul, S.F. et al. (1997) Nucleic Acids i?es.25:3389- 
3402; Zhang, J. and Madden, T.L. (1997) Genome Res. 7:649-656), especially 
blastp or tblastn (Altschul et al, 1997). Preferred parameters for BLASTp are: 
Expectation value: 10 (default) 
Filter: seg (default) 

25 Cost to open a gap: 11 (default) 

Cost to extend a gap: 1 (default 
Max. alignments: 100 (default) 
Word size: 11 (default) 

No. of descriptions: 100 (default) 
30 Penalty Matrix: BLOWSUM62 

[0102] The length of polypeptide sequences compared for homology will 
generally be at least about 16 amino acid residues, usually at least about 20 
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residues, more usually at least about 24 residues, typically at least about 28 
residues, and preferably more than about 35 residues. When searching a database 
containing sequences from a large number of different organisms, it is preferable to 
compare amino acid sequences. Database searching using amino acid sequences 
5 can be measured by algorithms other than blastp known in the art. For instance, 
polypeptide sequences can be compared using FASTA, a program in GCG Version 
6.1. FASTA provides alignments and percent sequence identity of the regions of 
the best overlap between the query and search sequences (Pearson, 1990, herein 
incorporated by reference). For example, percent sequence identity between amino 
10 acid sequences can be determined using FASTA with its default parameters (a 

word size of 2 and the PAM250 scoring matrix), as provided in GCG Version 6.1, 
herein incorporated by reference. 

[0103] "Specific binding" refers to the ability of two molecules to bind to each , 
other in preference to binding to other molecules in the environment. Typically, 
1 5 "specific binding" discriminates over adventitious binding in a reaction by at least 
two-fold, more typically by at least 10-fold, often at least 100-fold. Typically, the 
affini ty or avidity of a specific binding reaction is at least about 10-7 M (e.g., at 
least about 10" 8 M or 10" 9 M). 

[0104] The term "region" as used herein refers to a physically contiguous portion 
20 of the primary structure of a biomolecule. In the case of proteins, a region is 

defined by a contiguous portion of the amino acid sequence of lhat protein. 

[0105] The term "domain" as used herein refers to a structure of a biomolecule 

that contributes to a known or suspected function of the biomolecule. Domains 

may be co-extensive with regions or portions thereof; domains may also include 
25 distinct, non-contiguous regions of a biomolecule. Examples of protein domains 

include, but are not limited to, an Ig domain, an extracellular domain, a 

transmembrane domain, and a cytoplasmic domain. 

[0106] As used herein, the term €< molecule" means any compound, including, but 
not limited to, a small molecule, peptide, protein, sugar, nucleotide, nucleic acid, 
30 lipid, etc., and such a compound can be natural or synthetic. 

[0107] Unless otherwise defined, all technical and scientific terms used herein 
have the same meaning as commonly understood by one of ordinary skill in the art 



/ 
\ 



WO 03/056914 PCT/DS02/41510 

to which this invention pertains. Exemplary methods and materials are described 
below, although methods and materials similar or equivalent to those described 
herein can also be used in the practice of the present invention and will be apparent 
to those of skill in the art. AJ1 publications and other references mentioned herein 

5 are incorporated by reference in their entirety, hi case of conflict, the present 
specification, including definitions, will control. The materials, methods, and 
examples are illustrative only and not intended to be limiting. 
[0108] Throughout this specification and claims, the word "comprise" or 
variations such as "comprises" or "comprising", will be understood to imply the 

10 inclusion of a stated integer or group of integers but not the exclusion of any other 
integer or group of integers. 

Engineering or Selecting Hosts With Modified Lipid-Linked Oligosaccharides 
For The Generation of Human-like N-Glycans 

15 [0109] The invention provides a method for producing a human-like glycoprotein 
in a non-human eukaryotic host cell. The method involves making or using a non- 
human eukaryotic host cell (fiminished or depleted in an alg gene activity (i.e., alg 
activities, including equivalent enzymatic activities in non-fungal host cells) and 
introducing into the host cell at least one glycosidase activity. In a preferred 

20 embodiment, the glycosidase activity is introduced by causing expression of one or 
more mannosidase activities within the host cell, for example, by activation of a 
mannosidase activity, or by expression from a nucleic acid molecule of a 
mannosidase activity, in the host cell. 

[0110] In another embodiment, the method involves making or using a host cell 
25 (hminished or depleted in the activity of one or more enzymes that transfer a sugar 
residue to the 1,6 arm of lipid-linked oligosaccharide precursors (Fig. 1). A host 
cell of the invention is selected for or is engineered by introducing a mutation in 
one or more of the genes encoding an enzyme that transfers a sugar residue (e.g., 
mannosylates) the 1,6 arm of a lipid-linked oligosaccharide precursor. The sugar 
30 residue is more preferably mannose, is preferably a glucose, GlcNAc, galactose, 
sialic acid, fucose or GlcNAc phosphate residue. In a preferred embodiment, the 
activity of one or more enzymes that mannosylate the 1 ,6 arm of hpid-linked 
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oligosaccharide precursors is diminished or depleted. The method may further 
comprise the step of introducing into the host cell at least one glycosidase activity 
(see below). 

[0111] In yet another embodiment, the invention provides a method for 
5 producing a h uman- like glycoprotein in a non-human host, wherein the 

glycoprotein comprises an N-glycan having at least two GlcNAcs attached to a 
trimannose core structure. 

[0112] In each above embodiment, the method is directed to making a host cell 
in which the lipid-linked oligosaccharide precursors are enriched in ManxGlcNAc 2 

10 structures, where X is 3, 4 or 5 (Fig. 2). These structures are transferred in the ER 
of the host cell onto nascent polypeptide chains by an oligosaccharyl-transferase 
and may then be processed by treatment with glycosidases (e.g., a-mannosidases) 
and glycosyltransferases (e.g., GnTl) to produce N-glycans having 
GlcNAcMan x GlcNAc 2 core structures, wherein X is 3, 4 or 5, and is preferably 3 

15 (Figs. 2 and 3). As shown in Fig. 2, N-glycans having a GlcNAcMan x GlcNAc 2 
core structure where X is greater than 3 may be converted to 
GlcNAcMan 3 GlcNAc 2 , e.g., by treatment with an a-1,3 and/or ce- 1,2- 1,3 
mannosidase activity, where applicable. 

[0113] Additional processing of GlcNAcMan 3 GlcNAc 2 by treatment with 
20 glycosyltransferases (e.g., GnTH) produces GlcNAc 2 Man 3 GlcNAc 2 core structures 
which may then be modified, as desired, e.g., by ex vivo treatment or by 
heterologous expression in the host cell of a set of glycosylation enzymes, 
including glycosyltransferases, sugar transporters and mannosidases (see below), 
to become human-like N-glycans. Preferred human-like glycoproteins which may 
25 be produced according to the invention include those which comprise N-glycans 
having seven or fewer, or three or fewer, mannose residues; comprise one or more 
sugars selected from the group consisting of galactose, GlcNAc, sialic acid, and 
fucose; and comprise at least one oligosaccharide branch comprising the structure 
NeuNAc-Gal-GlcNAc-Man. 
30 [0114] In one embodiment, the host cell has diminished or depleted Dol-P- 
Man:Man 5 GlcNAc 2 -PP-Dol Mannosyltransferase activity, which is an activity 
involved in the first mannosylation step from MansGlcNAc 2 -PP-Dol to 
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ManeGlcNAcz-PP-Dol at the luminal side of the ER (e.g., ALG3 Fig. 1; Fig. 2). In 
S.cerevisiae, this enzyme is encoded by the ALG3 gene. As described above, 
S.cerevisiae cells harboring a leaky alg3-l mutation accumulate MansGlcNAcz- 
PP-Dol and cells having a deletion in alg3 appear to transfer Man 5 GlcNAc 2 
5 structures onto nascent polypeptide chains within the ER. Accordingly, in this 
embodiment, host cells will accumulate N-glycans enriched in MansGlcNAcz 
structures which can then be converted to GlcNAc2Man 3 GlcNAc 2 by treatment 
with glycosidases (e.g., with ce-1,2 mannosidase, o>l,3 mannosidase or o>l,2-l,3 

mannosidase activities (Fig. 2). 
10 [0115] As described in Example 1, degenerate primers were designed based on 

an alignment of Alg3 protein sequences from S. cerevisiae, D. melanogaster and 

humans (H. sapiens) (Figs. 4 and 5), and were used to amplify a product from P. 

pastoris genomic DNA. The resulting PCR product was used as a probe to identify 

and isolate a P. pastoris genomic clone comprising an open reading frame (ORF) 

15 that encodes a protein having 35% overall sequence identity and 53% sequence 

similarity to the S. cerevisiae ALG3 gene (Figs. 6 and 7). This P. pastoris gene is 

referred to herein as "PpALG3". The ALG3 gene was similarly identified and 

isolated fromK lactis (Example 1; Figs. 8 and 9). 

[0116] Thus, in another embodiment, the invention provides an isolated nucleic 
20 acid molecule having a nucleic acid sequence comprising or consisting of at least 
forty-five, preferably at least 50, more preferably at least 60 and most preferably 
75 or more nucleotide residues of the P. pastoris ALG Jgene (Fig. 6) and the K. 
lactis ALG 3gene (Fig. 8), and homologs, variants and derivatives thereof. The 
invention also provides nucleic acid molecules that hybridize under stringent 
25 conditions to the above-described nucleic acid molecules. Similarly, isolated 
polypeptides (including muteins, allelic variants, fragments, derivatives, and 
analogs) encoded by the nucleic acid molecules of the invention are provided 
(P.pastoris and K. lactis ALG 3gene products are shown in Fig. 6 and 8). In 
addition, also provided are vectors, including expression vectors, which comprise a 
30 nucleic acid molecule of the invention, as described further herein. 

[0117] Using gene-specific primers, a construct was made to delete the PpALG3 
gene from the genome of P. pastoris (Example 1). This strain was used to 
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generate a host cell depleted in Dol-P-Man:Man 5 GlcNAc 2 -PP-Dol 
Mannosyltransferase activity and produce lipid-linked Man 5 GlcNAc 2 -PP-Dol 
precursors which are transferred onto nascent polypeptide chains to produce N- 
glycans having a Man 5 GlcNAc 2 carbohydrate structure. 
5 [0118] As described in Example 2, such a host cell may be engineered by 

expression of appropriate mannosidases to produce N-glycans having the desired 
Man 3 GlcNAc 2 core carbohydrate structure. Expression of GnTs in the host cell 
(e.g., by targeting a nucleic acid molecule or a library of nucleic acid molecules as 
described below) enables the modified host cell to produce N-glycans having one 

10 or two GlcNAc structures attached to each arm of the Man3 core structure (i.e., 
GlcNAciMan 3 GlcNAc 2 or GlcNAc 2 Man 3 GlcNAc 2 ; see Fig. 3). These structures 
may be processed further using the methods of the invention to produce human- 
like N-glycans on proteins which enter the secretion pathway of the host cell. 
[0119] In another embodiment, the host cell has diminished or depleted dolichyl- 

15 P-Man:Man6GlcNAc2-PP-dolichyl a-1,2 mannosyltransferase activity, which is an 
a-1,2 mannosyltransferase activity involved in the mannosylation step converting 
Man*GlcNAc 2 -PP-Dol to Man 7 GlcNAc 2 -PP-Dol at the luminal side of the ER (see 
above and Figs. 1 and 2), In S.cerevisiae, this enzyme is encoded by the ALG9 
gene. Cells harboring an alg9 mutation accumulate Man6GlcNAc 2 -PP-Dol (Fig. 2) 

20 and transfer Man^GlcNA^ structures onto nascent polypeptide chains within the 
ER. Accordingly, in this embodiment, host cells will accumulate N-glycans 
enriched in Man 6 GlcNAc 2 structures which can then be processed down to core 
Man3 structures by treatment with a-1,2 and a-1,3 mannosidases (see Fig. 3 and 
Examples 3 and 4). 

25 [0120] A host cell in which the alg9 gene (or gene encoding an equivalent 

activity) has been deleted is constructed (see, e.g., Example 3). Deletion of ALG9 
(or ALG12\ see below) creates a host cell which produces N-glycans with one or 
two additional mannoses, respectively, on the 1,6 aim (Fig. 2). In order to make 
the 1,6 core-mannose accessible to N-acetylglucosaminyltransferase II (GnTH) 

30 these mannoses have to be removed by glycosidase(s). ER mannosidase typically 
will remove the terminal 1,2 mannose on the 1,6 arm and subsequently 
Mannosidase II (alpha 1-3,6 mannosidase) or other mannosidases such as alpha 
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1,2, alphal,3 or alpha 1-2,3 mannosidases (e.g., from Xanthomonas manihotis; see 
Example 4) can act upon the 1,6 arm and subsequently GnTH can transfer anN- 
acetylglucosamine, resulting in GlcNAc 2 Man 3 (Fig. 2). 

[0121] The resulting host cell, which is depleted for alg9p activity, is engineered 

5 to express a-1,2 and a-1,3 mannosidase activity (from one or more enzymes, and 
preferably, by expression from a nucleic acid molecule introduced into the host cell 
and which expresses an enzyme targeted to a preferred subcellular compartment 
(see below). Example 4 describes the cloning and expression of one such enzyme 
from Xanthomonas manihotis. 

10 [0122] In another embodiment, the host cell has diminished or depleted dolichyl- 
P-Man:Man7GlcNAc2-PP-dolichyl a-1,6 mannosyltransferase activity, which is an 
a-1,6 mannosyltransferase activity involved in the mannosylation step converting 
Man 7 GlcNAc 2 -PP-Dol to MangGlcNAcz-PP-Dol (which mannosylates the a-1,6 
mannose on the 1,6 arm of the core mannose structure) at the luminal side of the 

15 ER (see above and Figs. 1 and 2). In S.cerevisiae, this enzyme is encoded by the 
ALG12 gene. Cells harboring an algl2 mutation accumulate Man 7 GlcNAc 2 -PP- 
Dol (Fig. 2) and transfer Man 7 GlcNAc 2 structures onto nascent polypeptide chains 
within the ER. Accordingly, in this embodiment, host cells will accumulate N- 
glycans enriched in Man 7 GlcNAc 2 structures which can then be processed down to 

20 core Man3 structures by treatment with a-1,2 and a-1,3 mannosidases (see Fig. 3 
and Examples 3 and 4). 

[0123] As described above for alg9 mutant hosts, the resulting host cell, which is 
depleted for algl2p activity, is engineered to express a-1,2 and a-1,3 mannosidase 
activity (e.g., from one or more enzymes, and preferably, by expression from one 
25 or more nucleic acid molecules introduced into the host cell and which express an 
enzyme activity which is targeted to a preferred subcellular compartment (see 
below). 
[0124] 

Engineering or Selecting Hosts Optionally Having Decreased Initiating 
30 a-1,6 Mannosyltransferase Activity 

[0125] In a preferred embodiment, the method of the invention involves making 
or using a host cell which is both (a) diminished or depleted in the activity of an 



33 



WO 03/056914 



PCT7US02/41510 



alg gene or in one or more activities that mannosylate N-glycans on the a-1,6 arm 
of the Man 3 GlcNAc 2 ("Man3") core carbohydrate structure; and (b) diminished or 
depleted in the activity of an initiating a-l,6-mannosyltransferase 9 i.e., an initiation 
specific enzyme that initiates outer chain mannosylation (on the a-1,3 arm of the 
5 Man3 cores structure). In S.cerevisiae, this enzyme is encoded by the OCH1 gene. 
Disruption of the ochl gene in S.cerevisiae results in a phenotype in which N- 
linked sugars completely lack the poly-mannose outer chain. Previous approaches 
for obtaining mammalian-type glycosylation in fungal strains have required 
inactivation of OCH1 (see, e.g., Chiba, 1998). Disruption of the initiating cc-1,6- 

1 0 mannosyltransferase activity in a host cell of the invention is optional, however 
(depending on the selected host cell), as the Ochlp enzyme requires an intact 
Man«GlcNAc for efficient mannose outer chain initiation. Thus, the host cells 
selected or produced according to this invention, which accumulate lipid-linked 
oligosaccharides having seven or fewer mannose residues will, after transfer, 

15 produce hypoglycosylated N-glycans that will likely be poor substrates for Ochlp 
(see, e.g., Nakayama, 1997). 

Engineering or Selecting Hosts Having Increased Glucosyltransferase Activity 
[0126] As discussed above, glucosylated oligosaccharides are thought to be 

20 transferred to nascent polypeptide chains at a much higher rate than their 
nonglucosylated counterparts. It appears that substrate recognition by the 
oligosaccharyltransferase complex is enhanced by addition of glucose to the 
antennae of lipid-linked oligosaccharides. It is thus desirable to create or select 
host cells capable of optimal glucosylation of the lipid-linked oligosaccharides. In 

25 such host cells, underglycosylation will be substantially decreased or even 
abolished, due to a faster and more efficient transfer of glucosylated Man 5 
structures onto the nascent polypeptide chain. 

[0127] Accordingly, in another embodiment of the invention, the method is 
directed to making a host cell in which the lipid-linked N-glycan precursors are 
30 transferred efficiently to the nascent polypeptide chain in the ER. In a preferred 
embodiment, transfer is augmented by increasing the level of glucosylation on the 
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branches of hpid-linked oligosaccharides which, in turn, will make them better 
substrates for ohgosaccharyitransferase. 

[0128] hi one preferred embodiment, the invention provides a melhod for making 
a human-like glycoprotein which uses a host cell in which one or more enzymes 

5 responsible for glucosylation of hpid-linked oligosaccharides in the ER has 

increased activity. One way to enhance the degree of glucosylation of the lipid- 
linked ohgosaccharides is to overexpress one or more enzymes responsible for the 
transfer of glucose residues onto the antennae of the Hpid-linked oligosaccharide. 
In particular, increasing a-1,3 glucosyltransferase activity will increase the amount 

10 of glucosylated hpid-linked Man 5 structures and will reduce or eliminate the 

underglycosylation of secreted proteins. In S.cerevisiae, this enzyme is encoded 
by the ALG6 gene. 

[0129] Saccharomyces cerevisiae ALG6 and its human counterpart have been 
cloned (Lnbach, 1999; Reiss, 1996). Due to the evolutionary conservation of the 

1 5 early steps of glycosylation, ALG6 loci are expected to be homologous between 
species and may be cloned based on sequence similarities by anyone skilled in the 
art (The same holds true for cloning and identification of ALG8 and ALG1 0 loci 
from different species.) In addition, different glucosyltransferases from different 
species can then be tested to identify the ones with optimal activities. 

20 [0130] The introduction of additional copies of an ALG6 gene and/or the 

expression of ALG6 under the control of a strong promoter, such as the GAPDH 
promoter, is one of several ways to increase the degree of glucosylated lipid-linked 
ohgosaccharides. The ALG6 gene from P. pastoris is cloned and expressed 
(Example 5). ALG6 nucleic acid and amino acid sequences are show in Fig. 25 (S. 

25 cerevisiae) and Fig. 26 (P. pastoris). These sequences are compared to other 
eukaryotic ALG6 sequences in Fig. 27. 

[0131] Accordingly, another embodiment of the invention provides a method to 
enhance the degree of glucosylation of hpid-linked ohgosaccharides comprising 
the step of increasing alpha-1,3 glucosyltransferase activity in a host cell. The 
30 increase in activity may be achieved by overexpression of nucleic acid sequences 
encoding the activity, e.g., by operatively linking the nucleic acid encoding the 
activity with one or more heterologous expression control sequences. Preferred 
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expression control sequences include transcription initiation, termination, promoter 
and enhancer sequences; RNA splice donor and polyadenylation signals; mRNA 
stabilizing sequences; ribosome binding sites; protein stabilizing sequences; and 
protein secretion sequences. 
5 [0132] In another embodiment, the increase in alpha-1 ,3 glucosyltransferase 

activity is achieved by introducing a nucleic acid molecule encoding the activity on 
a multi-copy plasmid, using techniques well known to the skilled worker. In yet 
another embodiment, the degree of glucosylation of lipid-Iinked oligosaccharides 
comprising decreasing the substrate specificity of oligosaccharyl transferase 

10 activity in a host cell. This is achieved by, for example, subjecting at least one 
nucleic acid encoding the activity to a technique such as gene shuffling, in vitro 
mutagenesis, and error-prone polymerase chain reaction, all of which are well- 
known to one of skill in the art. Naturally, ALG8 and ALG10 can be 
overexpressed in a host cell and tested in a similar fashion. 

1 5 [0133] Accordingly, in a preferred embodiment, the invention provides a method 
for making a human-like glycoprotein using a host cell which is engineered or 
selected so that one or more enzymes responsible for glucosylation of lipid-linked 
oligosaccharides in the ER has increased activity. In a more preferred 
embodiment, the invention uses a host cell having both (a) diminished or depleted 

20 in the activity of one or more alg gene activities or activities that mannosylate N- 
glycans on the 05-1,6 aim of the Man3GlcNAc2 ("Man3") core carbohydrate 
structure and (b) engineered or selected so that one or more enzymes responsible 
for glucosylation of lipid-linked oligosaccharides in the ERhas increased activity. 
The lipid-linked Man 5 structure found in an alg3 mutant background, however, is 

25 not a preferred substrate for Alg6p. Accordingly, the skilled worker may identify 
Alg6p, Alg8p and AlglOp with an increased substrate specificity (Gibbs, 2001) 
e.g., by subjecting nucleic acids encoding such enzymes to one or more rounds of 
gene shuffling, error prone PCR, or in vitro mutagenesis approaches and selecting 
for increased substrate specificity in a host cell of interest, using molecular biology 

30 and genetic selection techniques well known to those of skill in the art. It will be 
appreciated by the skilled worker that such techniques for improving enzyme 
substrate specificities in a selected host strain are not limited to this particular 



WO 03/056914 PCT7US02/41510 



embodiment of the invention but rather, may be used in any embodiment to 
optimize further the production of human-like N-glycans in a non-human host cell. 
[0134] As described, once Man 5 is transferred onto the nascent polypeptide 
chain, expression of suitable ot-1 ,2-mannosidase(s), as provided by the present 

5 invention, will further trim Man 5 GlcNAc 2 structures to yield the desired core 

Man 3 GlcNAc 2 structures. a-l,2-mannosidases remove only terminal a-l,2-linked 
mamaose residues and are expected to recognize the Man 5 GlcNAc 2 - 
Man 7 GlcNAc 2 specific structures made in alg3, 9 and 12 mutant host cells and in 
host cells in which homologs to these genes are mutated. 

10 [0135] As schematically presented in Figure 3, co-expression of appropriate 

UDP-sugar-transporter(s) and -transferase(s) will cap the terminal cc-1,6 and a-1,3 
residues with GlcNAc, resulting in the necessary precursor for mammalian-type 
complex and hybrid N-glycosylation: GlcNAc 2 Man 5 GlcNAc 2 . The peptide-bound 
N-linked oUgosaccharide chain GlcNAc 2 Man 3 GlcNAc 2 (Figure 3) then serves as a 

15 precursor for further modification to a mammalian-type oUgosaccharide structure. 
Subsequent expression of galactosyl-tranferases and genetically engineering the 
capacity to transfer sialylic acid will produce a mammalian-type (e.g., human-like) 
N-glycan structure. 

[0136] A desired host cell according to the invention can be engineered one 
20 enzyme or more than one enzyme at a time. In addition, a library of genes 

encoding potentially useful enzymes can be created, and a strain having one or 
more enzymes with optimal activities or producing the most "human-like" 
glycoproteins, selected by transforming target host cells with one or more members 
of the library. Lower eukaryotes that are able to produce glycoproteins having the 
25 core JV-glycan Man 3 GlcNAc 2 are particularly useful because of the ease of 
performing genetic manipulations, and safety and efficiency features. In a 
preferred embodiment, at least one further glycosylate reaction is performed, ex 
vivo or in vivo, to produce a human-like N-glycan. In a more preferred 
embodiment, active forms of glycosylating enzymes are expressed in the 
30 endoplasmic reticulum and/or Golgj apparatus of the host cell to produce the 
desired human-like glycoprotein. 
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Host Cells 

[0137] A preferred non-human host cell of the invention is a lower eukaryotic 
cell, e.g., a unicellular or filamentous fungus, which is diminished or depleted in 
the activity of one or more alg gene activities (including an enzymatic activity 
5 which is a homolog or equivalent to an alg activity). Another preferred host cell of 
the invention is diminished or depleted in the activity of one or more enzymes 
(other than alg activities) that mannosylate the a-1,6 aim of a lipid-linked 
oligosaccharide structure. 

[01 38] While lower eukaryotic host cells are preferred, a wide variety of host 
1 0 cells having the aforementioned properties are envisioned as being useful in the 

methods of the invention. Plant cells, for instance, may be engineered to express a 
human-like glycoprotein according to the invention. Likewise, a variety of non- 
human, mammalian host cells may be altered to express more human-like 
glycoproteins using the methods of the invention. An appropriate host cell can be 
1 5 engineered, or one of the many such mutants already described in yeasts may be 
used. A preferred host cell of the invention, as exemplified herein, is a 
hypermannosylation-minus (OCH1) mutant in Pichia pastoris which has further 
been modified to delete the algS gene. Other preferred hosts are Pichia pastoris 
mutants having ochl and alg 9 or algl2 mutations. 

20 

Formation of complex N-glycans 

[0139] The sequential addition of sugars to the modified, nascent N-glycan 
structure involves the successful targeting of glucosyltransferases into the Golgi 
apparatus and their successful expression- This process requires the functional 
25 expression, e.g., of GnT I, in the early or medial Golgi apparatus as well as 
ensuring a sufficient supply of UDP-GlcNAc (e.g., by expression of a UDP- 
GlcNAc transporter). 

[0140] To characterize the glycoproteins and to confirm the desired 
glycosylation, the glycoproteins were purified, the N-glycans were PNGase-F 
30 released and then analyzed by MALDI-TOF-MS (Example 2). Kringle 3 domain 
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of human plasminogen was used as the reporter protein. This soluble glycoprotein 
was produced in P. pastoris in an alg3, ochl knockout background (Example 2). 
[0141] GlcNAcMan5GlcNAc 2 was produced as the predominant N-glycan after 
addition of human GnT I, and JL lactis UDP-GlcNAc transporter in Fig. 16 
5 (Example 2). The mass of this N-glycan is consistent with the mass of 

GlcNAcMan 5 GlcNAc 2 at 1463 (m/z). To confirm the addition of the GlcNAc onto 
Man 5 GlcNAc 2 , a /3-N-hexosaminidase digest was performed, which revealed a 
peak at 1260 (m/z), consistent wilh the mass of Man 5 GlcNAc 2 (Rg.17). 
[0142] The N-glycans from the alg3 ochl deletion in one strain PBP3 (Example 
10 2) provided two distinct peaks at 1 138 (m/z) and 1300 (m/z), which is consistent 
with structures GlcNAcMan 3 GlcNAc 2 and GlcNAcMantGlcNAcj (Fig. 18). After 
an in vitro al,2-mannosidase digestion for redundant mannoses, a peak eluted at 
1 138 (m/z), which is consistent with GlcNAcMan 3 GlcNAc 2 (Fig. 19). To confirm 
the addition of the GlcNAc onto the MansGlcNA^ structure, a 0-N- 
15 hexosaminidase digest was performed, which revealed a peak at 934 (m/z), 
consistent with the mass of Man 3 GlcNAc 2 (Fig. 20). 

[0143] The addition of Ihe second GlcNAc onto GlcNAcMan 3 GlcNAc 2 is shown 
in Fig. 21. The peak at 1357 (m/z) corresponds to GlcNAc 2 Man 3 GlcNAc 2 . To 
confirm the addition of the two GlcNAcs onto the core mannose structure 

20 Man 3 GlcNAc 2 , another 0-N-hexosaminidase digest was performed, which revealed 
a peak at 934 (m/z), consistent with the mass of Man 3 GlcNAc 2 (Fig. 22). This is 
conclusive data displaying a complex-type glycoprotein made in yeast cells. 
[0144] The in vitro addition of UDP-galactose and |3 1,4-galactosyltransf erase 
onto the GlcNAczManjGlcNAcz resulted in a peak at 1664 (m/z), which is 

25 consistent with the mass of Gal 2 GlcNAc 2 Man 3 GlcNAc 2 (Fig. 23) Finally, the in 
vitro addition of CMP-N-acetymeuranunic acid and sialyltransferase resulted in a 
peak at 2248 (m/z), which is consistent with the mass of 

NANA 2 Gal 2 GlcNAc 2 Man 3 GlcNAc 2 (Fig. 24). The above data supports the use of 
non-mammalian host cells, which are capable of producing complex human-like 
30 glycoproteins. 
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Targeting of glycosyl- and galactosyl-transferases to specific organelles. 
[0145] Much work has been dedicated to revealing the exact mechanismby 
which these enzymes are retained and anchored to their respective organelle. 
Although complex, evidence suggests that, stem region, membrane spanning 
5 region and cytoplasmic tail individually or in concert direct enzymes to the 

membrane of individual organelles and thereby localize the associated catalytic 
domain to that locus. 

[0146] The method by which active glycosyltransferases can be expressed and 
directed to the appropriate organelle such that a sequential order of reactions may 
10 occur, that leads to complex N-glycan formation, is as follows: 

(A) Establish a DNA library of regions that are known to encode proteins/peptides 
that mediate localization to a particular location in the secretory pathway (ER, 
Golgi and trans Golgi network). A limited selection of such enzymes and their 
respective location is shown in Table 1. These sequences maybe selected from 

15 the host to be engineered as well as other related or unrelated organism. Generally 
such sequences fall into three categories: (1) N-tenninal sequences encoding a 
cytosolic tail (ct), a transmembrane domain (tmd) and part of a somewhat more 
ambiguously defined stem region (sr), which together or individually anchor 
proteins to the inner (lumenal) membrane of the Golgi, (2) retrieval signals which 

20 are generally found at the C-terminus such as the HDEL or KDEL tetrapeptide, 
and (3) membrane spanning nucleotide sugar transporters, which are known to 
locate in the Golgi. In the first case, where the localization region consists of 
various elements (ct, tmd and sr) the library is designed such that the ct, the tmd 
and various parts of the stem region are represented This may be accomplished by . 

25 using PCR primers that bind to the 5' end of the DNA encoding the cytosolic 

region and employing a series of opposing primers that bind to various parts of the 
stem region. In addition one would create fusion protein constructs that encode 
sugar nucleotide transporters and known retrieval signals. 

(B) A second step involves the creation of a series of fusion protein constructs, 
30 that encode the above mentioned localization sequences and the catalytic domain 

of a particular glycosyltransferase cloned in frame to such localization sequence 
(e.g. GnT I, GalT, Fucosyltransferase or ST). In the case of a sugar nucleotide 
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transporter fused to a catalytic domain one may design such constructs such mat 
the catalytic domain (e.g. GnT D is either at the N- or the C-terminus of the 
resulting polypeptide. The catalytic domain, like the localization sequence, may be 
derived from various different sources. The choice of such a catalytic domains 
5 may be guided by the knowledge of the particular environment in which the 

catalytic domain is to be active. For example, if aparticular glycosyltransferase is 
to be active in the late Golgi, and all known enzymes of the host organism in the 
late Golgi have a pH optimum of 7.0, or the late Golgi is known to have a 
particular pH, one would try to select a catalytic domain that has maximum activity 
10 at that pH. Existing in vivo data on the activity of such enzymes, in particular 
hosts, may also be of use. For example, Schwientek and coworkers showed that 
GalT activity can be engineered into the Golgi of S.cerevisiae and showed that 
such activity was present by demonstrating the transfer of some Gal to existing 
GlcNAcz in an alg mutant of S. cerevisiae. In addition, one may perform several 
1 5 rounds of gene shuffling or error prone PCR to obtain a larger diversity within the 
pool of fusion constructs, since it has been shown that single amino mutations may 
drastically alter the activity of glycoprotein processing enzymes (Romero et al., 
2000). Full length sequences of glycosyltransferases and their endogenous 
anchoring sequence may also be used. In a preferred embodiment, such 
20 localization/catalytic domain libraries are designed to incorporate existing 
information on the sequential nature of glycosylation reactions in higher 
eukaryotes. In other words, reactions known to occur early in the course of 
glycoprotein processing require the targeting of enzymes that catalyze such 
reactions to an early part of the Golgi or the ER For example, the trimming of 
25 MangGlcNAcz to Man 5 GlcNAc 2 is an early step in complex N-glycan formation. 
Since protein processing is initiated in the ER and then proceeds through the early, 
medial and late Golgi, it is desirable to have this reaction occur in the ER or early 
Golgi. When designing a library for mannosidase I localization, one thus attempts 
to match ER and early Golgi targeting signals with the catalytic domain of 

30 mannosidase I. 

[01471 Upon transformation of the host strain with the fusion construct library a 
selection process is used to identify which particular combination of localization 
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sequence and catalytic domain in fact have the maximum effect on the 
carbohydrate structure found in such host strain. Such selection can be based on 
any number of assays or detection methods. They may be carried out manually or 
may be automated through the use of high troughput screening equipment. 
5 [0148] In another example, GnT I activity is required for the maturation of 
complex N-glycans, because only after addition of GlcNAc to the terminal ccl,3 
mannose residue may further trimming of such a structure to the subsequent 
intermediate GlcNAcMan 3 GlcNAc 2 structure occur. Mannosidase II is most likely 
not capable of removing the terminal al,3- and <xl,6- mannose residues in the 

10 absence of a terminal P 1,2-GlcNAc and thus the formation of complex N-glycans 
will not proceed in the absence of GnT I activity (Schachter, 1991). Alternatively, 
one may first engineer or select a strain that makes sufficient quantities of 
Man5GlcNAc 2 as described in this invention by engineering or selecting a strain 
deficient in Alg3P activity. In the presence of sufficient UDP-GlcNAc transporter 

15 activity, as may be achieved by engineering or selecting a strain that has such 
UDP-GlcNAc transporter activity, GlcNAc can be added to the terminal a-1,3 
residue by GnTI as in vitro a Man 3 structure is recognized by by rat liver GnTI 
(Moller, 1992). 

[0149] In another approach, one may incorporate the expression of a UDP- 
20 GlcNAc transporter into the library mentioned above such that the desired 
construct will contain: (1) a region by which the transformed construct is 
maintained in the cell (e.g. origin of replication or a region that mediates 
chromosomal integration), (2) a marker gene that allows for the selection of cells 
that have been transformed, including counterselectable and recyclable markers 
25 such as ura3 or T-urfl3 (Soderholm, 2001) or other well characterized selection- 
markers (e.g, his4, bla t Sh ble etc.), (3) a gene encoding a UDP-GlcNAc 
transporter (e.g. from KJactis, (Abeijon, 1996), or from Ksapiens (Ishida, 1996), 
and (4) a promotor activating the expression of the above mentioned 
localization/catalytic domain fusion construct library. 
30 [0150] After transformation of the host with the library of fusion constructs 

described above, one may screen for those cells that have the highest concentration 
of terminal GlcNAc on the cell surface, or secrete the protein with the highest 
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terminal GlcNAc content Such a screen may be based on a visual method, like a 
staining procedure, the ability to bind specific terminal GlcNAc binding antibodies 
or lectins conjugated to a marker (such lectins are available from E.Y. Laboratories 
Inc., San Mateo, CA), the reduced ability of specific lectins to bind to terminal 

5 mannose residues, the ability to incorporate a radioactively labeled sugar in vitro, 
altered binding to dyes or charged surfaces, or may be accomplished by using a 
Fluorescence Assisted Cell Sorting (FACS) device in conjunction with a 
fluorophore labeled lectin or antibody (Guillen, 1998). It may be advantageous to 
enrich particular phenotypes within the transformed population with cytotoxic 

1 0 lectins. U.S. Patent No. 5,595,900 teaches several methods by which cells with a 
desired extra-cellular carbohydrate structures may be identified. Repeatedly 
carrying out this strategy allows for the sequential engineering of more and more 
complex glycans in lower eukaryotes. 

[0151] After transformation, one may select for transformants that allow for the 

15 most efficient transfer of GlcNAc by GlcNAc Transferase E from UDP-GlcNAc in 
an in vitro assay. This screen may be carried out by growing cells harboring the 
transformed library under selective pressure on an agar plate and transferring 
individual colonies into a 96-well microtiter plate. After growing the cells, the 
cells are centrifuged, the cells resuspended in buffer, and after addition of UDP- 

20 GlcNAc and GnT V, the release of UDP is determined either by HPLC or an 

enzyme linked assay for UDP. Alternatively, one may use radioactively labeled 
UDP-GlcNAc and GnT V, wash the cells and then look for the release of 
radioactive GlcNAc by N-actylglucosarrunidase. All this may be carried manually 
or automated through the use of high throughput screening equipment. 

25 [0152] Transformants that release more UDP, in the first assay, or more 

radioactively labeled GlcNAc in the second assay, are expected to have a higher 
degree of GlcNAcMan 3 GlcNAc 2 (Fig. 3) on their surface and thus constitute the 
desired phenotype. Alternatively, one may any use any other suitable screen such 
as a lectin binding assay that is able to reveal altered glycosylation patterns on the 

30 surface of transformed cells. In this case the reduced binding of lectins specific to 
terminal mannoses may be a suitable selection tool. Galantus nivalis lectin binds 
specifically to terrninal a-1,3 mannose, which is expected to be reduced if 



43 



WO 03/056914 PCT/US02/41S10 

sufficient mannosedase H activity is present in the Golgi. One may also enrich for 
desired transformants by carrying out a chromatographic separation step that 
allows for the removal of cells containing a high terminal mannose content This 
separation step would be carried out with a lectin column that specifically binds 

5 cells with a high terminal mannose content (e.g Galatitus nivalis lectin bound to 
agarose , Sigma, StLouis, MO) over those mat have a low terminal mannose 
content. In addition, one may directly create such fusion protein constructs, as 
additional information on the localization of active carbohydrate modifying 
enzymes in different lower eukaryotic hosts becomes available in the scientific 

10 literature. For example, the prior art teaches us that human pi,4-GalTr can be 

fused to the membrane domain of MNT, a mannosyltransferase from S. cerevisiae, 
and localized to the Golgi apparatus while retaining its catalytic activity 
(Schwientek et aL, 1995). If S. cerevisiae or a related organism is the host to be 
engineered one may directly incorporate such findings into the overall strategy to 

15 obtain complex N-glycans from such a host Several such gene fragments in 
P.pastoris have been identified mat are related to glycosyltransferases in 
S. cerevisiae and thus could be used for that purpose. 
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MnsI 


S.cerevisiae 


mannosidase 


ER 


Ochl 


S. cerevisiae 


1 ,6-mannosyltransferase 


Golgi (cis) 
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S.cerevisiae 


1 ,2-mannosyltransferase 


Golgi (medial) 
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S.cerevisiae 


1 ,3-mannosyltransferase 


Golgi (trans) 


Ochl 
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1 ,6-mannosyltransferase 


Golgi (cis) 
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H.sapiens 
S. frugiperda 


2,6-sialyltransferase 


trans-Golgi network 


01,4 Gal T 


bovine milk 


UDP-Gal transporter 


Golgi 


Mntl 


S.cerevisiae 


1,2-mannosyltransferase 


Golgi (cis) 


HDEL at C- 
tenninus 


S.cerevisiae 


retrieval signal 


ER 
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Integration Sites 

[0153] As one ultimate goal of this genetic engineering effort is a robust protein 
production strain that is able to perform well in an industrial fermentation process, 
the integration of multiple genes into the host (e.g., fungal) chromosome involves 

5 careful planning. The engineered strain will most likely have to be transformed 
with a range of different genes, and these genes will have to be transformed in a 
stable fashion to ensure that the desired activity is maintained throughout the 
fermentation process. Any combination of the following enzyme activities will 
have to be engineered into the fungal protein expression host: sialyltransferases, 

10 mannosidases, fucosyltransferases, galactosyltransferases, glucosyltransferases, 
GlcNAc transferases, ER and Golgi specific transporters (e.g. syn and antiport 
transporters for UDP-galactose and other precursors), other enzymes involved in 
the processing of oligosaccharides, and enzymes involved in the synthesis of 
activated oligosaccharide precursors such as UDP-galactose, CMP-N- 

1 5 acetytoeuraminic acid. At the same time, a number of genes which encode 

enzymes known to be characteristic of non-human glycosylation reactions, will 
have to be deleted. Such genes and their corresponding proteins have been 
extensively characterized in a number of lower eukaryotes (e.g. S.cerevisiae, 
T.reesei, A. nidulans etc.), thereby providing a list of known glycosyltransferases 

20 in lower eukaryotes, their activities and their respective genetic sequence. These 
genes are likely to be selected from the group of mannosyltransferases e.g. 1,3 
mannosyltransferases (e.g. MNN1 in S.cerevisiae) (Gr aham, 1991), 1,2 
mannosyltransferases (e.g. KTR/KRE family from S.cerevisiae), 1,6 
mannosyltransferases (OCH1 from S.cerevisiae), mannosylphosphate transferases 

25 (MNN4 and MNN6 from S.cerevisiae) and additional enzymes mat are involved in 
aberrant i.e. non human glycosylation reactions. Many of these genes have in fact 
been deleted individually giving rise to viable phenotypes with altered 
glycosylation profiles. Examples are shown in Table 2: 



Table 2. 
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Mutant 


Structurewild 


Structure 


Authors 






tjpe 


mutant 


Schizosaccharomyces 
pombe 


OCH1 


Maxman (i.e. 
Man> 9 GlcNAc2) 


MangGlcNAc2 


Yoko-oetal., 2001 
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S.cerevisiae 


OCH1, 
MNN1 


Mannaii(Le. 
Man^GlcNAcJ 


MangGlcNAc2 


Nakardstu-Shindo 
etal,. 1993 


S.cerevisiae 


OCH1, 
MNNl 
MNN4 


Mannan (i.e. 
Man^GlcNAcs) 


MansGlcNAc 2 


ChibaetaL, 1998 



As any strategy to engineer the formation of complex N-glycans into a lower 
eukaryote involves both the elimination as well as the addition of 
glycosyltransferase activities, a comprehensive scheme will attempt to coordinate 
5 both requirements. Genes that encode enzymes that are undesirable serve as 
potential integration sites for genes that are desirable. For example, 1,6 
mannosyltransferase activity is a hallmark of glycosylation in many known lower 
eukaryotes. The gene encoding alpha-1,6 mannosyltransferase (OCH1) has been 
cloned from S.cerevisiae and mutations in the gene give raise to a viable phenotype 

10 with reduced mannosylation. The gene locus encoding alpha- 1 ,6 

mannosyltransferase activity therefor is a prime target for the integration of genes 
encoding glycosyltransferase activity. In a similar manner, one can choose a range 
of other chromosomal integration sites that, based on a gene disruption event in 
that locus, are expected to: (1) improve the cells ability to glycosylate in a more 

1 5 human like fashion, (2) improve the cells ability to secrete proteins, (3) reduce 

proteolysis of foreign proteins and (4) improve other characteristics of the process 
that facilitate purification or the fermentation process itself. 
Providing sugar nucleotide precursors 

[0154] A hallmark of higher eukaryotic glycosylation is the presence of 
20 galactose, fucose, and a high degree of terminal sialic acid on glycoproteins. 
These sugars are not generally found on glycoproteins produced in yeast and 
filamentous fungi and the method discussed above allows for the engineering of 
strains that localize glycosyltransferase in the desired organelle. Formation of 
complex N-glycan synthesis is a sequential process by which specific sugar 
25 residues are removed and attached to the core oligosaccharide structure. In higher 
eukaryotes, this is achieved by having the substrate sequentially exposed to various 
processing enzymes. These enzymes carry out specific reactions depending on 
their particular location within the entire processing cascade. This "assembly line" 
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consists of ER, early, medial and late Golgi, and the trans Golgi network all with 
their specific processing environment. To recreate the processing of human 
glycoproteins in the Golgi and ER of lower eukaryotes, numerous enzymes (e.g. 
glycosyltransferases, glycosidases, phosphatases and transporters) have to be 

5 expressed and specifically targeted to these organelles, and preferably, in a location 
so that they function most efficiently in relation to their environment as well as to 
other enzymes in the pathway. [0155] Several individual glycosyltransferases 
have been cloned and expressed in S.cerevisiae (GalT, GnT I), Aspergillus 
nidulajis (GnT I) and other fungi, without however demonstrating the desired 

10 outcome of "humanization" on the glycosylation pattern of the organisms 

(Yoshida, 1995; Schwientek, 1995; Kalsner, 1995). It was speculated that the 
carbohydrate structure required to accept sugars by the action of such 
glycosyltransferases was not present in sufficient amounts. While this most likely 
contributed to the lack of complex N-glycan formation, there are currently no 

15 reports of a fungus supplying a Man 5 GlcNAc 2 structure, having GnT I activity and 
having UDP-Gn transporter activity engineered into the fungus. It is the 
combination of these three biochemical events that are required for hybrid and 
complex N-glycan formation. 

[0156] In humans, the full range of nucleotide sugar precursors (e.g. UDP-N- 
20 acetylglucosamine, UDP-N-acetylgalactosamine, CMP-N-acetylneuraminic acid, 
UDP-galactose, etc.) are generally synthesized in the cytosol and transported into 
the Golgi, where they are attached to the core oligosaccharide by 
glycosyltransferases. To replicate this process in lower eukaryotes, sugar 
nucleoside specific transporters have to be expressed in the Golgi to ensure 
25 adequate levels of nucleoside sugar precursors (Sommers, 1981; Sommers, 1982; 
Perez, 1987). A side product of this reaction is either a nucleoside diphosphate or 
monophosphate. While monophosphates can be directly exported in exchange for 
nucleoside triphosphate sugars by an antiport mechanism, diphospho nucleosides 
(e.g. GDP) have to be cleaved by phosphatases (e.g. GDPase) to yield nucleoside 
30 monophosphates and inorganic phosphate prior to being exported. This reaction 
appears to be important for efficient glycosylation, as GDPase from S.cerevisiae 
has been found to be necessary for mannosylation. However, the enzyme only has 
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10% of the activity towards UDP (Berninsone, 1994). Lower eukaryotes often do 
not have UDP specific diphosphatase activity in the Golgi since they do not utilize 
UDP-sugar precursors for glycoprotein synthesis in the Golgi. 
[0157] Schizosaccliaromyces pombe, a yeast found to add galactose residues to 
5 cell wall polysaccharides (from UDP-galactose) was found to have specific 
UDPase activity further suggesting the requirement for such an enzyme 
(Berninsone et ah, 1 994). UDP is known to be a potent inhibitor of 
glycosyltransferases and the removal of this glycosylation side product is 
important in order to prevent glycosyltransferase inhibition in the lumen of the 

10 Golgi (Khatara et al., 1974). Thus, one may need to provide for the removal of 
UDP, which is expected to accumulate in the Golgi of such an engineered strains 
(Berninsone, 1995; Beaudet, 1998). [0158] In another example, 2,3 
sialyltransferase and 2,6 sialyltransferase cap galactose residues with sialic acid in 
the trans-Golgi and TGN of humans leading to a mature form of the glycoprotein. 

15 To reengineer this processing step into a metabolically engineered yeast or fungus 
will require (1) 2,3 -sialyltransferase activity and (2) a sufficient supply of CMP-N- 
acetyl neur aminic acid, in the late Golgi of yeast. To obtain sufficient 2,3- 
sialyltransferase activity in the late Golgi, the catalytic domain of a known 
sialyltransferase (e.g. from humans) has to be directed to the late Golgi in fungi 

20 (see above). Likewise, transporters have to be engineered to that allow the 

transport of CMP-N-acetyl neuraminic acid into the late Golgi. There is currently 
no indication that fungi synthesize sufficient amounts of CMP-N-acetyl neuraminic 
acid, not to mention the transport of such a sugar-nucleotide into the Golgi. 
Consequently, to ensure the adequate supply of substrate for the corresponding 

25 glycosyltransferases, one has to metabolically engineer the production of CMP- 
sialic acid into the fungus. 

Methods for providing sugar nucleotide precursors to the Golgi apparatus: 

UDP-N-acetyl-glucosamine 
30 [0159] The cDNA of human UDP-N-acetylglucosamine transporter, which was 
recognized through a homology search in the expressed sequence tags database 
(dbEST) was cloned by Ishida and coworkers (Ishida, 1999). Guillen and 
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coworkers have cloned the mammalian Golgi membrane transporter for UDP-N- 
• acetylglucosamine by phenotypic correction with cDNA from canine kidney cells 
(MDCK) of a recently characterized Kluyveromyces lactis mutant deficient in 
Golgi transport of the above nucleotide sugar (Guillen, 1998). Their results 

5 demonstrate that the mammalian Golgi UDP-GlcNAc transporter gene has all of 
the necessary information for the protein to be expressed and targeted functionally 
to the Golgi apparatus of yeast and that two proteins with very different amino acid 
sequences may transport the same solute within the same Golgi membrane 
(Guillen, 1998). 

10 GDP-Fucose 

[0160] The rat liver Golgi membrane GDP-fucose transporter has been identified 
and purified by Puglielli, L. and C. B. Hirschberg (Puglielli, 1999). The 
corresponding gene has not been identified however N-terminal sequencing can be 
used for the design of oligonucleotide probes specific for the corresponding gene. 
15 These oligonucleotides can be used as probes to clone the gene encoding for GDP- 
fucose transporter. 
UDP-Galactose 

[0161] Two heterologous genes, gmal2(+) encoding alpha 1,2- 
galactosyltransferase (alpha 1,2 GalT) from Schizosaccharomyces pombe and 

20 (hUGT2) encoding human UDP-galactose (UDP-Gal) transporter, have been 
functionally expressed in S.cerevisiae to examine the intracellular conditions 
required for galactosylation. Correlation between protein galactosylation and 
UDP-galactose transport activity indicated that an exogenous supply of UDP-Gal 
transporter, rather than alpha 1,2 GalT played a key role for efficient 

25 galactosylation in S.cerevisiae (Kainuma, 1999). Likewise a UDP-galactose 
transporter from S. pombe was cloned (Aoki, 1999; Segawa, 1999). 

CMP-N-acetylnewaminic acid (CMPSialic acid) 
[0162] Human CMP-sialic acid transporter (hCST) has been cloned and 
expressed in Lec 8 CHO cells (Aoki, 1999; Eckhardt, 1997). The functional 

30 expression of the murine CMP-sialic acid transporter was achieved in 

Saccharomyces cerevisiae (Berninsone, 1997). Sialic acid has been found in some 
fungi, however it is not clear whether the chosen host system will be able to supply 
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sufficient levels of CMP-Sialic acid Sialic acid can be either supplied in the 
medium or alternatively fungal pathways involved in sialic acid synthesis can also 
be integrated into the host genome. 

5 Diphosphatases 

[0163] When sugars are transferred onto a glycoprotein, either a nucleoside 
diphosphate or monophosphate, is released from the sugar nucleotide precursors. 
. While monophosphates can be directly exported in exchange for nucleoside 
triphosphate sugars by an antiport mechanism, diphospho nucleosides (e.g. GDP) 

1 0 have to be cleaved by phosphatases (e.g. GDPase) to yield nucleoside 

monophosphates and inorganic phosphate prior to being exported. This reaction 
appears to be important for efficient glycosylation, as GDPase from S.cerevisiae 
has been found to be necessary for mannosylatiorL However, the enzyme only has 
10% of the activity towards UDP (Berninsone, 1994). Lower eukayotes often do 

15 not have UDP specific diphosphatase activity in the Golgi since they do not utilize 
UDP-sugar precursors for glycoprotein synthesis in the Golgi. 
Schizosaccharomyces pombe, a yeast found to add galactose residues to cell wall 
polysaccharides (from UDP-galactose) was found to have specific UDPase activity 
further suggesting the requirement for such an enzyme (Berninsone, 1994). UDP 

20 is known to be a potent inhibitor of glycosyltransferases and the removal of this 
glycosylation side product is important in order to prevent glycosyltransferase 
inhibition in the lumen of the Golgi (Khatara et al. 1974). 

Expression Of GnTs To Produce Complex N-glycans 

25 

Ex pression Of GnT-IH To Boost Antibody Functionality 
[01 64] The addition of an N-acetylglucosamine to the GlcNAciMan 3 GlcNAc2 
structure by N-acetylglucosaminyltransferases II and in yields a so-called bisected 
N-glycan GlcNAc3Man 3 GlcNAc 2 (Fig. 3). This structure has been implicated in 
30 greater antibody-dependent cellular cytotoxicity (ADCC) (Umana et al. 1999). Re- 
engineering glycoforms of immunoglobulins expressed by mammalian cells is a 
tedious and cumbersome task. Especially in the case of GnUE, where over- 
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expression of this enzyme has been implicated in growth inhibition, methods 
involving regulated (inducible) gene expression had to be employed to produce 
immunoglobulins with bisected N-glycans (Umana et al 1999a, 1999b). 
[01651 Accordingly, in another embodiment, the invention provides systems and 
5 methods for producing human-like N-glycans having bisecting N- 

ac«tylglucosamine (GlcNAcs) on the core mannose structure. In a preferred 
embodiment, the invention provides a system and method for producing 
immunoglobulins having bisected N-glycans. The systems and methods described 
herein will not suffer from previous problems, e.g., cytotoxicity associated with 
10 overexpression of GnTHI or ADCC, as the host cells of the invention are 

engineered and selected to be viable and preferably robust cells which produce N- 
glycans having substantially modified human-type glycoforms such as 
GlcNAc 2 Man 3 GlcNAc 2 . Thus, addition of a bisecting N-acetylglucosamine in a 
host cell of the invention will have a negligible effect on the growth-phenotype or 
1 5 viability of those host cells. 

[0166] In addition, previous work (Umana) has shown that there is no linear 
correlation between GnTEI expression levels and the degree of ADCC. Finding 
the optimal expression level in mammalian cells and mamtaining it throughout an 
FDA approved fermentation process seems to be a challenge. However, in cells of 
20 the invention, such as fungal cells, finding a promoter of appropriate strength to 
establish a robust, reliable and optimal GnTDI expression level is a comparatively 
easy task for one of skill in the art 

[0167] A host cell such as a yeast strain capable of producing glycoproteins with 
bisecting N-glycans is engineered according to the invention, by introducing into 

25 the host cell a GnTIH activity (Example 6). Preferably, the host cell is 

transformed with a nucleic acid that encodes GnTTII (see, e.g., Fig. 32) or a 
domain thereof having enzymatic activity, optionally fused to a heterologous cell 
signal targeting peptide (e.g., using the libraries and associated methods of the 
invention.) Host cells engineereded to express GnTm will produce higher 

30 antibody titers than mammalian cells are capable of They will also produce 
antibodies with higher potency with respect to ADCC. 
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[0168] Antibodies produced by mammalian cell lines transfected with GnTTH 
have been shown to be as effective as antibodies produced by non-transfected cell- 
lines, but at a 10-20 fold lower concentration (Davies et al. 2001). An increase of 
productivity of the production vehicle of the invention over m amm a li an systems by 
5 a factor of twenty, and a ten-fold increase of potency will result in a net- 
productivity improvement of two hundred. The invention thus provides a system 
and method for producing high titers of an antibody having high potency (e.g., up 
to several orders of magnitude more potent than what can currently be produced). 
The system and method is safe and provides high potency antibodies at low cost in 
10 short periods of time. Host cells engineered to express GnT HI according to the 
invention produce immunoglobulins having bisected N-glycans at rates of at least 
50 mg/liter/day to at least 500 mg/liter/day. In addition, each immunoglobulin (Ig) 
molecule (comprising bisecting GlcNAcs) is more potent than the same Ig 
molecule produced without bisecting GlcNAcs. 

15 

rinninp; and ex pression of GnT-IV and GnT-V 

[0169] All branching structures in complex N-glycans are synthesized on a 
common core-pentasaccharide (Man 3 GlcNAc 2 or Man alphal-6(Man alphal- 
3)Man betal-4 GlcNAc betal-4 GlcNAc betal-4 or Man 3 GlcNAc 2 ) by N- 

20 acetylglucosamine transferases (GnTs) -I to -VI (Schachter H et al. (1989) 

Methods Enzymo;l79:35l-97). Current understanding of the biosynthesis of more 
highly branched N-glycans suggests that after the action of GnTH (generation of 
GlcNAc 2 Man 3 GlcNAc 2 structures) GnTTV transfers GlcNAc from UDP-GlcNAc 
in betal,4 linkage to the Man alphal,3 Man betal,4 aim of GlcNAc 2 Man 3 GlcNAc 2 

25 N-glycans (Allen SD et al. (1984) J Biol Chern. Jun 10;259(1 1):6984-90; and 

Gleeson PA and Schachter HJ (1983); J.Biol Chem 25;258(10):6162-73) resulting 
in a triantennary agalacto sugar chain. This N-glycan (GlcNAc betal-2 Man 
alphal-6(GlcNAc betal-2 Man alphal-3) Man betal-4 GlcNAc beta 1-4 GlcNAc 
betal,4 Asn) is a common substrate for GnT-HI and -V, leading to the synthesis 

30 of bisected, tri-and tetra-antennary structures. Where the action of GnTDI results 
in a bisected N-glycan and where GnTV catalyzes the addition of beta l-6GlcNAc 
to the alpha 1-6 mannosyl core, creating the beta 1-6 branch. Addition of galactose 



( 



WO 03/056914 PCT/US02/41510 

and sialic acid to these branches leads to the generation of a fully sialylated 
complex N-glycan. 

[0170] Branched complex N-glycans have been implicated in the physiological 
activity of therapeutic proteins, such as human erythropoietin (hEPO). Human 
5 EPO having bi-antennary structures has been shown to have a low activity, 

whereas hEPO having tetra-antennary structures resulted in slower clearance from 
the bloodstream and thus in higher activity (Misaizu T et al. (1995) Blood Dec 
1;86(11):4097-104). 

[0171] With DNA sequence information, the skilled worker can clone DNA 
10 molecules encoding GnT IV and/or V activities (Example 6; Figs. 33 and 34). 
Using standard techniques well-known to those of skill in the art, nucleic acid 
molecules encoding GnT W or V (or encoding catalytically active fragments 
thereof) may be inserted into appropriate expression vectors under the 
transcriptional control of promoters and other expression control sequences 
1 5 capable of driving transcription in a selected host cell of the invention, e.g., a 

fungal host such as Pichia sp., Kluyveromyces sp. and Aspergillus sp., as described 
herein, such that one or more of these mammalian GnT enzymes may be actively 
expressed in a host cell of choice for production of a human-like complex 
glycoprotein. 

20 

[0172] The following are examples which illustrate the compositions and 
methods of this invention These examples should not be construed as limiting: 
the examples are included for the purposes of illustration only. 

25 EXAMPLE 1 

Identification, cloning and deletion of the ALG3 gene in P.pastoris and ILlactis. 
[0173] Degenerate primers were generated based on an alignment of Alg3 
protein sequences from S. cerevisiae, H. sapiens, and£>. melanogaster and were 
used to amplify an 83 bp product from P. pastoris genomic DNA: 

30 5 ' -GGTGTTTTGTTTTCTAGATCTTTGC AYT AYCARTT-3 ' and ' 

5 ' - AGAATTTGGTGGGTAAGAATTCC ARCACC AYTCRTG-3 ' The resulting 
PGR product was cloned into the pCR2.1 vector (Invitrogen, Carlsbad, CA) and 
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seqence analysis revealed homology to known ALG3/RHK1/NOT56 homologs 
(Genbank NCJWl 134.2, AF309689, NC 003424.1). Subsequently, 1929 bp 
upstream and 2738 bp downstream of the initial PGR product were amplified from 
a P. pastoris genomic DNA library (Boehm, T. Yeast 1999 May;15(7):563-72) 
5 using the internal oligonucleotides 

5'- CCTAAGCTGGTATGCGTTCTCTTTGCCATATC-3 * and 

5 '-GCGGCATAAACAATAATAGATGCTATAAAG-3 ' along with T3 

(5 ^AATTAACCCTCACTAAAGGG-3 ') and T7 (5'-GTAA 

TACGACTCACTATAGGGC-3 ') (Integrated DNA Technologies, Coralville, IA) 

10 in the backbone of the library bearing plasmid lambda ZAP II (Stratagene, La 
Jolla, CA). The resulting fragments were cloned into the pCR2. 1-TOPO vector 
(Invilrogen) and sequenced. From this sequence, a 1395 bp ORP was identified 
that encodes a protein with 35% identity and 53% similarity to the S. cerevisiae 
ALG3 gene (using BLAST programs). The gene was named PpALG3. 

15 [0174] The sequence of PpALG3was used to create a set of primers to generate a 
deletion construct of the PpALG3 gene by PGR overlap (Davidson et al, 2002 
Microbiol 148(Pt 8):2607-15). Primers below were used to amplify 1 kb regions 
5 s and 3' of the PpALG3 ORF and the KAN R gene, respectively: 
RCD142 (5'-CCACATCATCCGTGCTACATATAG-3>)> 

20 RCD144 (5 ' -ACGAGGCAAGCTAAAC AGATCTCGAAGTATCGAGGGTT AT 
CCAG-3'), 

RCD145 (5 '-CC ATCC AGTGTCGAAAACGAGCCAATGGTTC ATGTCTATA 
AATC-3'), 

RCD147 (5 '-AGCCTCAGCGCCAACAAGCGATGG-3 '), 
25 RCD143 (5 '-CTGGATAACCCTCGATACTTCGAGATCTGTTTAGCTTGCC 
TCGT-3')>and 

RCD146 (5 1 -GATTTATAGAC ATGAACCATTGGCTCGTTTTCGACACTG^ 
ATGG-3'). 

Subsequently, primers RCD142 and RCD147 were used to overlap the three 
30 resulting PCR products into a single 3 .6 kb alg3::KAlf^ deletion allele. 

Identification, cloning and deletion of theALG3 gene in Klactis. 
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[0175] The ALG3p sequences from S. cerevisiae, Drosophila melanogaster, 
Homo sapiens etc were aligned with iC lactis sequences (PENDANT EST 
database). Regions of high homology that were in common homologs hut distinct 
in exact sequence from the homologs were used to create pairs of degenerate 
5 primers that were directed against genomic DNA from the K. lactis strain MG1/2 
(Bianchi et al, 1987). In the case of ALG3, PCR amplification with primers KALI 
(5 '-ATCCTTTACCGATGCTGTAT-3 ' ) andKAL-2 (5'- 
ATAACAGTATGTGTTAC ACGCGTGTAG-3 ' ) resulted in a product that was 
cloned and sequenced and the predicted translation was shown to have a high 
10 degree of homology to Alg3p proteins (>50% to S. cerevisiae Alg3p). 

[0176] The PCR product was used to probe a Southern blot of genomic DNA 
from K. lactis strain (MG1/2) with high stringency (Sambrook et al, 1989). 
Hybridization was observed in a pattern consistent with a single gene. This 
Southern blot was used to map the genomic loci. Genomic fragments were cloned 
1 5 by digesting genomic DNA and ligating those fragments in the appropriate size- 
range into pUC19 to create a K. lactis subgenomic library. This subgenomic 
library was transformed into E. coli and several hundred clones were tested by 
colony PCR using primers KAL-1 and KAL-2. The clones containing the 
predicted KIALG3 andKlALG61 genes were sequenced and open reading frames 
20 identified. 

[0177] Primers for construction of an alg3::HAT^ deletion allele, using a PCR 
overlap method (Davidson et al, 2002), were designed and the resulting deletion 
allele was transformed into two JL lactis strains and NAT-resistant colonies 
selected. These colonies were screened by PCR and transformants were obtained 
25 in which the ALG3 ORF was replaced with the ochl::NA7* mutant allele. 

EXAMPLE 2 

Generation of an alg3/ochl mutant strain expressing an a-l,2-Mannosidase, 
GnTl and GnTII for production of a human-like glycoprotein. 

[0178] The 1215 bp open reading frame of the P. pastoris OCH1 gene as well as 
30 2685 bp upstream and 1 175 bp downstream was amplified by PCR (B. K. Choi et 
al., submitted to Proc. Natl. Acad. Sci. USA 2002; see also WO 02/00879; each of 
which is incorporated herein by reference), cloned into the pCR2.1-TOPO vector 
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(Ihvitrogen) and designated pBK9. To create an ochl knockout strain containing 
multiple auxotrophic markers, 100 \ig of pJN329, a plasmid containing an 
ochl:: URA3 mutant allele flanked with Sfil restriction sites was digested with Sfil 
and used to transform P. pastoris strain JC308 (Cereghino et al. Gene 263 (2001) 
5 159-169) by electroporation. Following incubation on defined medium lacking 
uracil for 10 days at room temperature, 1000 colonies were picked and re-streaked. 
URA 4 * clones that were unable to grow at 37°C, but grew at room temperature, 
were subjected to colony PCR to test for the correct integration of the ochl::URA3 
mutant allele. One clone that exhibited the expected PCR pattern was designated 

1 0 YJN1 53 . The Kringle 3 domain of human plasminogen (K3) was used as a model 
protein. A Neo R marked plasmid containing the K3 gene was transformed into 
strain YJN153 and a resulting strain, expressing K3, was named BK64-1 (B. K. 
Choi et al, submitted to Proc. Natl. Acad. Set USA 2002). 
[0179] Plasmid pPB103, containing the KJuyveromyces lactis MNN2-2 gene, 

15 encoding a Golgi UDP-N-acetylglucosamine transporter was constructed by 

cloning a blunt BglUrHindni fragment from vector pDL02 (Abeijon et al. (1996) 
Proc. Natl. Acad. Sci. U.S.A. 93:5963-5968) into BglE tmdBaniHI digested and 
blunt ended pBLADE-SX containing the P. pastoris ADE1 gene (Cereghino et al. 
(2001) Gene 263:159-169). This plasmid was linearized with EcoNl and 

20 transformed into strain BK64-1 by electroporation and one strain confirmed to 
contain the MNN2-2 by PCR analysis was named PBPL 

[0180] A library of mannosidase constructs was generated, comprising in-frame 
fusions of the leader domains of several type I or type II membrane proteins from 
S. cerevisiae and P. pastoris fused with the catalytic domains of several a- 1,2- 

25 mannosidase genes from human, mouse, fly, worm and yeast sources (see, e.g., 
WO02/00879, incorporated herein by reference). This library was created in a P. 
pastoris HIS4 integration vector and screened by linearizing with Sail, 
transforming by electroporation into strain PBP1, and analyzing the glycans 
released from the K3 reporter protein. One active construct chosen was a chimera 

30 of the 988-1296 nucleotides (C-terminus) of the yeast SEC12 gene fused with a N- 
tenninal deletion of the mouse a-l,2-mannosidase IA (MmMannIA) gene, which 
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was missing the 187 nucleotides. A P. pastoris strain expressing this construct was 
named PBP2. 

[0181] A library of GnTI constructs was generated, comprising in-frame fusions 
of the same leader library with the catalytic domains of GnTI genes from human, 

5 worm, frog and fly sources (WO 02/00879). This library was created in a J 5 . 
pastoris ARG4 integration vector and screened by linearizing with AaM, 
transforming by electroporation into strain PBP2, and analyzing the glycans 
released from K3. One active construct chosen was a chimera of the first 120 bp of 
the S. cerevisiae MNN9 gene fused to a deletion of the human GnTI gene, which 

10 was missing the first 154 bp. A P. pastoris strain expressing this construct was 
named PBP3. 

[0182] Subsequently, a P. pastoris alg3::KAl^ deletion construct was generated 
as described above. Approximately 5ug of the resulting PCR product was 
transformed into strain PBP3 and colonies were selected on YPD medium 

15 containing 200ug/ml G418. One strain out of 20 screened by PCR was confirmed 
to contain the correct integration of the alg3::KAK^ mutant allele and lack the 
wild-type allele. This strain was named RDP27. 
[01 83] Finally, a library of GnTEE constructs was generated, which was 
comprised of in-frame fusions of the leader library with, the catalytic domains of 

20 GnTH genes from human and rat sources (WO 02/00879). This library was 

created in a P. pastoris integration vector containing the NST* gene conferring 
resistance to the drug nourseothricin. The library plasmids were linearized with 
EcdBJ, transformed into strain RDP27 by electroporation, and the resulting strains 
were screened by analysis of the released glycans from purified K3. 

25 

Materials 

[01 84] MOPS, sodium cacodylate, manganese chloride, UDP-galactose and 
CMP-N-acelymeuraminic acid were from Sigma. TFA was from Aldrich. 
Recombinant rat o2,6-sialyltransferase from Spodopterafrugiperda and pl,4- 
30 galactosyltransferase from bovine milk were from Calbiochem. Protein N- 

glycosidase F, mannosidases, and otigosaccharides were from Glyko (San Rafael, 
CA). DEAE ToyoPearl resin was from TosoHaas. Metal chelating "HisBind" 
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resin was from Novagen (Madison, WI). 96-well lysate-clearing plates were from 
Promega (Madison, WI). Protein-binding 96-well plates were from Millipore 
(Bedford, MA). Salts and buffering agents were from Sigma (St Louis, MO). 
MALDI matrices were from Aldrich (Milwaukee, WI). 

5 

Protein Purification 

[0185] Kringle 3 was purified using a 96-well format on a Beckman BioMek 
2000 sample-handling robot (Beckman/Coulter Ranch Cucamonga, CA). Kringle 
3 was purified from expression media using a C-terminal hexa-histidine tag. The 

1 0 robotic purification is an adaptation of the protocol provided by Novagen for their 
HisBind resin. Briefly, a 150uL QiL) settled volume of resin is poured into the 
wells of a 96-well lysate-binding plate, washed with 3 volumes of water and 
charged with 5 volumes of 50mM NiS04 and washed with 3 volumes of binding 
buffer (5mM imidazole, 0.5M NaCl, 20mM Tris-HCL pH7.9). The protein 

15 expression media is diluted 3:2, media/PBS (60mM P04, 16mM KC1, 822mM 
NaCl pH7.4) and loaded onto the columns. After draining, the columns are 
washed with 10 volumes of binding buffer and 6 volumes of wash buffer (30mM 
imidazole, 0.5M NaCl, 20mM Tris-HCl pH7.9) and the protein is eluted with 6 
volumes of elution buffer (1M imidazole, 0.5M NaCl, 20mM Tris-HCl pH7.9). 

20 The eluted glycoproteins are evaporated to dryness by lyophilyzation. 

Release of N-linked Glycans 

[0186] The glycans are released and separated from the glycoproteins by a 
modification of a previously reported method (Papac, et al. A. J. S. (1998) 

25 Glycobiology 8, 445-454). The wells of a 96-well MultiScreen IP (Immobilon-P 
membrane) plate (Millipore) are wetted with lOOuL of methanol, washed with 
3X150uL of water and 50uL of RCM buffer (8M urea, 360mM Tris, 3.2mM 
EDTA pH8.6), draining with gentle vacuum after each addition. The dried protein 
samples are dissolved in 30uL of RCM buffer and transferred to the wells 

30 (xmtoining lOuL of RCM buffer. The wells are drained and washed twice with 

RCM buffer. The proteins are reduced by addition of 60uL of 0. 1M DTT in RCM 
buffer for lhr at 37oC. The wells are washed three times with 300uL of water and 
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carboxymethylated by addition of 60uL of 0.1M iodoacetic acid for 30min in the 
dark at room temperature. The wells are again washed three times with water and 
the membranes blocked by the addition of lOOuL of 1% PVP 360 in water for lhr 
at room temperature. The wells are drained and washed three times with 300uL of 
5 water and deglycosylated by the addition of 30uL of lOmM NH4HC03 pH 8.3 
containing one milliunit of N-glycanase (Glyko). After 16 hours at 37oC, the 
solution containing the glycans was removed by centrifugation and evaporated to 
dryness. 

10 Matrix Assisted Laser Desorption Ionization Time of Might Mass 
Spectrometry 

(01871 Molecular weights of the glycans were determined using a Voyager Dti 
PRO linear MALDI-TOF (Applied Biosciences) mass spectrometer using delayed 
extraction. The dried glycans from each well were dissolved in 15uL of water and 
15 0.5uL spotted on stainless steel sample plates and mixed with 0.5uL of S-DHB 

matrix (9mg/mL of dihydroxybenzoic acid, lmg/mL of 5-methoxysalicilic acid in 
1:1 water/acetonitrile 0.1% TFA) and allowed to dry. 

[0188] Ions were generated by irradiation wilh a pulsed nitrogen laser (337nm) 
with a 4ns pulse time. The instrument was operated in the delayed extraction mode 

20 with a 125ns delay and an accelerating voltage of 20kV. The grid voltage was 

93.00%, guide wire voltage was 0.10%, the internal pressure was less than 5X10- 
7 torr, and the low mass gate was 875Da. Spectra were generated from the sum of 
100-200 laser pulses and acquired with a 2 GHz digitizer. Man5 oligosaccharide 
was used as an external molecular weight standard. All spectra were generated 

25 with the instrument in the positive ion mode. The estimated mass accuracy of the 
spectra was 0.5%. 



Materials: 

[0189] MOPS, sodium cacodylate, manganese chloride, TJDP-galactose and 
30 CMP-N-acetymeuraminic acid were from Sigma, Saint Louis, MO. Trifluroacetic 
acid (TFA) was from Sigma/Aldrich, Saint Louis, MO. Recombinant rat alpha-2,6- 
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sialyltransferase from Spodoptera frugiperda and beta-l,4-galactosyltransferase 
from bovine milk were from Calbiochem, San Diego, CA. 

j8-N-acetylhexosaminidase Digestion 
5 [0190] The glycans were released and separated from the glycoproteins by a 
modification of a previously reported method (Papac, et al. A. J. S. (1998) 
Glycobiology 8, 445-454). After the proteins were reduced and carboxymethylated, 
and the membranes blocked, the wells were washed three time with water. The 
protein was deglycosylated by the addition of 30 pi of 10 mM NH4HCO3 pH 8.3 
10 containing one milliunit of N-glycanase (Glyko, Novate, CA). After 16 hr at 37°C, 
the solution cont aining the glycans was removed by centri&gation and evaporated 
to dryness. The glycans were then dried in SC210A speed vac (Thermo Savant, 
Halbrook, NY). The dried glycans were put in 50 mM NH4AC pH 5.0 at 37°C 
overnight and lmU of hexos (Glyko, Novato, CA) was added. 

15 

Galactosyltransferase Reaction 

[0191] Approximately 2mg of protein (r-K3 :hPg [PBP6-5]) was purified by 
nickel-affinity chromatography, extensively dialyzed against 0.1% TFA, and 
lyophilized to dryness. The protein was redissolved in 1 50(xL of 50mM MOPS, 
20 20mM MnC12, pH7.4. After addition of 32.5jig (533nmol) of UDP-galactose and 
4mUof P 1,4-galactosyltransferase, the sample was incubated at 37° C for IS 
hours. The samples were then dialyzed against 0.1% TFA for analysis by MALDI- 
TOF mass spectrometry. 

[0192] The spectrum of the protein reacted with galactosyltransferase showed an 
25 increase in mass consistent with the addition of two galactose moieties when 

compared with the spectrum of a similar protein sample incubated without enzyme. 
Protein samples were next reduced, carboxymethylated and deglycosylated with 
PNGase F. The recovered N-glycans were analyzed by MALDI-TOF mass 
spectrometry. The mass of the predominant glycan from the galactosyltransferase 
30 reacted protein was greater than that of the control glycan by a mass consistent 
with the addition of two galactose moieties (325.4 Da). 
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Sialyltransferase Reaction 

[0193] After resuspending the (galactosyltransferase reacted) proteins in 1 0|iL of 
50mM sodium cacodylate buffer pH6.0, 300ug (488nmol) of CMP-N- 
acetymeuraminic acid (CMP-NANA) dissolved in 15uL of the same buffer, and 

5 5uL (2mU) of recombinant a-2,6 sialyltransferase were added. After incubation at 
37°C for 15 hours, an additional 200ug of CMP-NANA and lmU of 
sialyltransferase were added. The protein samples were incubated for an additional 
8 hours and then dialyzed and analyzed by MALDI-TOF-MS as above. 
[0194] The spectrum of the glycoprotein reacted with sialyltransferase showed an 

10 increase in mass when compared with that of the starting material (the protein after 
ralactosvltransferase reaction). The N-glycans were released and analyzed as 
above. The increase in mass of Ihe two ion-adducts of the predominant glycan was 
consistent with the addition of two sialic acid residues (580 and 583Da). 



15 EXAMPLE 3 

Identification, cloning and deletion of the 
ALG9 andALG 12 genes in P.pastoris 

[0195] Similar to Example 1, the ALG9p and ALG12 sequences, respectively 
20 from S. cerevisiae, Drosophila melanogaster, Homo sapiens, etc., is aligned and 
regions of high homology are used to design degenerate primers. These primers 
are employed in a PCR reaction on genomic DNA from the P. pastoris. The 
resulting initial PCR product is subcloned, sequenced and used to probe a Southern 
blot of genomic DNA from P. pastoris with high stringency (Sambrook et al., 
25 1989). Hybridization is observed. This Southern blot is used to map the genomic 
loci. Genomic fragments are cloned by digesting genomic DNA and ligating those 
fragments in the appropriate size-range into pUC19 to create a P. pastoris 
subgenomic library. This subgenomic library is transformed into E. coli and 
several hundred clones tested by colony PCR, using primers designed based on the 
30 sequence of the initial PCR product. The clones containing the predicted genes are 
sequenced and open reading frames identified. Primers for construction of an 
alg9::NAI* deletion allele, using a PCR overlap method (Davidson et al., 2002), 
are designed. The resulting deletion allele is transformed into two P.pastoris 
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strains and NAT resistant colonies are selected These colonies are screened by 
PGR and transformants obtained in which the ALG9 ORF is replaced with the 
ochlr.NAT* mutant allele. See generally, Cipollo et al. Glycobiology 2002 
(12)11:749-762; Chantret et al. J. Biol. Chem. Jul. 12, 2002 (277)28:25815-25822; 
5 Cipollo et al. J. Biol. Chem. Feb. 1 1, 2000 (275)6:4267^277; Burda et aL Proc. 
Nail. Acad. Sci. USA. My 1996 (93):7160-7165; Karaoglu et al. Biochemistry 
2001, 40, 12193-12206; Grimme et al. J. Biol. Chem. July 20, 2001 
(276)29:27731-27739; Verostek et al. J. Biol Chem. June 5, 1993 (268)16:12095- 
12103; Huffaker et al. Proc. Natl. Acad. Sci. U.S.A. Dec. 1983 (80):7466-7470. 

10 

EXAMPLE 4 

Identification, cloning and expression of Alpha 1,2-3 Mannosidase From 

Xanthomonas Manihotis 

15 

[0196] The alpha 1 ,2-3 Mannosidase from Xanthomonas Manihotis has two 
activities: an alpha- 1,2 and an alpha-1,3 mannosidase. The methods of the 
invention may also use two independent mannosidases having these activities, 
which may be similarly identified and cloned from a selected' organism of interest. 

20 [0197] As described by Landry et al., alpha-mannosidases can be purified from 
Xanthomonas sp. 9 such as Xanthomonas manihotis. X. manihotis can be purchased 
from the American Type Culture Collection (ATCC catalog number 49764) 
(Xanthomonas axonopodis Starr and Garces pathovar manihotis deposited as 
Xanthomonas manihotis (Arthaud-Berthet) Starr). Enzymes are purified from 

25 crude cell-extracts as previously described (Wong-Madden, S.T. and Landry, D. 
(1995) Purification and characterization of novel glycosidases from the bacterial 
genus Xanthomonas; and Landry, D. US Patent US 6,300,113 Bl Isolation and 
composition of novel Glycosidases). After purification of the mannosidase, one of 
several methods are used to obtain peptide sequence tags (see, e.g., W. Quadroni 

30 M et al. (2000). A method for the chemical generation of N-terminal peptide 
sequence tags for rapid protein identification. Anal Chem (2000) Mar 
1;72(5): 1006-14; Wilkins MR et al. Rapid protein identification using N-tenninal 
"sequence tag" and amino acid analysis. Biochem Biophys Res Commun. (1996) 
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Apr 25;221(3):609-13; and Tsugita A. (1987) Developments in protein 
microseqaeocmg. Adv Biophys (1987) 23:81-113). 

[0198] Sequence tags generated using a method above are then used to generate 
sets of degenerate primers using methods well-known to the skilled worker. 

5 Degenerate primers are used to prime DNA amplification in polymerase chain 
reactions (e.g., using Taq polymerase kits according to manufacturers' 
instructions) to amplify DNA fragments. The amplified DNA fragments are used 
as probes to isolate DNA molecules comprising the gene encoding a desired 
mannosidase, e.g., using standard Southern DNA hybridization techniques to 

10 identify and isolate (clone) genomic pieces encoding the enzyme of interest The 
genomic DNA molecules are sequenced and putative open reading frames and 
coding sequences are identified. A suitable expression construct encoding for the 
glycosidase of interest can then be generated using methods described herein and 
well-known in the art. 

15 [0199] Nucleic acid fragments comprising sequences encoding alpha 1,2-3 
mannosidase activity (or catalytically active fragments thereof) are cloned into 
appropriate expression vectors for expression, and preferably targeted expression, 
of these activities in an appropriate host cell according to the methods set forth 
herein. 



20 



EXAMPLE 5 

Identification, cloning and expression of the ALG6 gene in P.pastoris 

[0200] Similar to Example 1 , the ALG6p sequences from S. cerevisiae, 
Drosophila melanogaster, Homo sapiens etc, are aligned and regions of high 

25 homology are used to design degenerate primers. These primers are employed in a 
PCR reaction on genomic DNA from the P. pastoris. The resulting initial PCR 
product is subcloned, sequenced and used to probe a Southern blot of genomic 
DNA from P. pastoris with high stringency (Sambrook et al, 1989). Hybridization 
is observed. This Southern blot is used to map the genomic loci. Genomic 

30 fragments are cloned by digesting genomic DNA and ligating those fragments in 
the appropriate size-range into pUC19 to create a P. pastoris subgenomic library. 
This subgenomic library is transformed into E. coli and several hundred clones are 
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tested by colony PCR, using primers designed based on the sequence of the initial 
PCR product. The clones containing the predicted genes are sequenced and open 
reading frames identified. Primers for construction of an alg6::NAl* deletion 
allele, using a PGR. overlap method (Davidson et al, 2002), are designed and the 
5 resulting deletion allele is transformed into two P. pastoris strains and NAT 

resistant colonies selected. These colonies are screened by PCR and transformants 
are obtained in which the ALG6 ORF is replaced with the ochlr.NA'f mutant 
allele. See, e.g., Imbach et al. Proc. Natl Acad. Set U.S.A. June 1999 (96)6982- 
6987. 

1 0 [0201] Nucleic acid fragments comprising sequences encoding Alg6p (or 
catalytically active fragments thereof) are cloned into appropriate expression 
vectors for expression, and preferably targeted expression, of these activities in an 
appropriate host cell according to the methods set forth herein. The cloned ALG6 
gene can be brought under the control of any suitable promoter to achieve 

1 5 overexpression. Even expression of the gene under the control of its own promoter 
is possible. Expression from multicopy plasmids will generate high levels of 
expression ("overexpression"). 



EXAMPLE 6 

20 Cloning and Expression Of GnT HI To Produce 

Bisecting GlcNAcs Which Boost Antibody Functionality 

A. Background 

[0202] The addition of an N-acetylglucosamine to the GlcNAc 2 Man3GlcNAc 2 
25 structure by N-acetylglucosaminyltransferases III yields a so-called bisected N- 
glycan (see Figure 3). This structure has been implicated in greater antibody- 
dependent cellular cytotoxicity (ADCC) (Umana et al. 1999). 
[0203] A host cell such as a yeast strain capable of producing glycoproteins with 
bisected N-glycans is engineered according to the invention, by introducing into 
30 the host cell a GnTm activity. Preferably, the host cell is transformed with a 

nucleic acid that encodes GnTTH (e.g., a mammalian such as the murine GnT HI 
shown in Fig. 32) or a domain thereof having enzymatic activity, optionally fused 
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to a heterologous cell signal targeting peptide (e.g., using the libraries and 
associated methods of the invention.) 

[0204] IgGs consist of two heavy-chains (Vh, C h 1, C h 2 and C H 3 in Figure 30), 
interconnected in the hinge region through three disulfide bridges, and two light 
5 chains (Vl, C L in Figure 30). The light chains (domains V L and C L ) are linked by 
another disulfide bridge to the C H 1 portion of the heavy chain and together with the 
C H 1 and V H fragment make up the so-called Fab region. Antigens bind to the 
terminal portion of the Fab region. The Fc region of IgGs consists of the C H 3, the 
C H 2 and the hinge region and is responsible for the exertion of so-called effector 
1 0 functions (see below). 

[02051 The primary function of antibodies is binding to an antigen. However, 
unless binding to the antigen directly inactivates the antigen (such as in the case of 
bacterial toxins), mere binding is meaningless unless so-called effector-functions 
are triggered. Antibodies of the IgG subclass exert two major effector-functions: 
15 the activation ofthe complement system and induction of phagocytosis. The 
complement system consists of a complex group of serum proteins involved in 
controlling inflammatory events, in the activation of phagocytes and in the lyrical 
destruction of cell membranes. Complement activation starts with binding of the 
CI complex to the Fc portion of two IgGs in close proximity. CI consists of one 
20 molecule, Clq, and two molecules, Clr and Cls. Phagocytosis is initiated through 
an interaction between the IgG's Fc fragment and Fc-gamma-receptors (Fc-yRI, E 
and HI in Figure 30). Fc receptors are primarily expressed on the surface of 
effector cells ofthe immune system, in particular macrophages, monocytes, 
myeloid cells and dendritic cells. 
25 [0206] The C H 2 portion harbors a conserved N-glycosylation site at asparagine 
297 (Asp297). The Asp297 N-glycans are highly heterogeneous and are known to 
affect Fc receptor binding and complement activation. Only a minority (i.e., about 
15-20%) of IgGs bears a disialylated, and 3-10% have a monosialylated N-glycan 
(reviewed in Jefferis, R., Glycosylafion of human IgG Antibodies. BioPharm, 
30 2001). Interestingly, the minimal N-glycan structure shown to be necessary for 
fully functional antibodies capable of complement activation and Fc receptor 
binding is a pentasacharide with terminal N-acetylgJucosamine residues 
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(GlcNAc 2 Man 3 ) (reviewed in Jefferis, R-, Glycosylation of human IgG Antibodies. 
BioPhaim, 2001). Antibodies with less than a GlcNAc 2 Man 3 N-glycan or no N- 
glycosylation at Asp297 might still be able to bind an antigen but most likely will 
not activate the crucial downstream events such as phagocytosis and complement 
5 activation. In addition, antibodies with fungal-type N-glycans attached to Asp297 
will in all likelihood solicit an immune-response in a mammalian organism which 
will render that antibody useless as a therapeutic glycoprotein. 

B. Cloning And Expression Of GnTm 

1 0 The DNA fragment encoding part of the mouse GnTm protein lacking the TM 

domain is PGR amplified from murine (or other mammalian) genomic DNA using 
forward S'-TCCTGGCGCGCCTTCCCGAGAGAACTGGCCTCCCTC-S' and 
5 '-AATTAATTAACCCTAGCCCTCCGCTGTATCCAACTTG-3 ' reversed 
primers. Those primers include AscI and Pad restriction sites that will be used for 

1 5 cloning into the vector suitable for the fusion with leader library. 

The nucleic acid and amino acid sequence of murine GnTIII is shown in Fig. 32. 

C Cloning of immunoglobulin encoding sequences 

[02071 P rotocols for the cloning of the variable regions of antibodies, including 
20 primer sequences, have been published previously. Sources of antibodies and 

encoding genes can be, among others, in vitro immunized human B cells (see, e.g., 

Borreback, C.A. et aL (1988) Proc. Natl. Acad. Sci. USA 85, 3995-3999), periphal 

blood lymphocytes or single human B cells (see, e.g., Lagerkvist, AC. et al. 

(1995) Biotechniques 18, 862-869; and Terness, P. et al. (1997) Hum. Immunol. 56, 
25 17-27) and transgenic mice con tainin g human immunoglobulin loci, allowing the 

creation of hybridoma cell-lines. 

[0208] Using standard recombinant DNA techniques, antibody-encoding nucleic 
acid sequences can be cloned. Sources for the genetic information encoding 
immunoglobulins of interest are typically total RNA preparations from cells of 
30 interest, such as blood lymphocytes or hybridoma cell lines. For example, by 
employing a PGR based protocol with specific primers, variable regions can be 
cloned via reverse transcription initiated from a sequence-specific primer 



WO 03/056914 



PCT/US02/41510 



hybridizing to the IgG C H 1 domain site and a second primer encoding amino acids 
111-118 of the murine kappa constant region. The V H and V K encodingcDNAs 
will then be amplified as previously published (see, e.g., Graziano, RF. et al. 
(1995) J Immunol. 155(10): p. 4996-5002; Welschof, M. et al. (1995) J. Immunol 
5 Methods 179, 203-214; and Orlandi, R etaL (1988) Proc. Natl. Acad. Sci. USA 86: 
3 833). Cloning procedures for whole immunoglobulins (heavy and light chains 
have also been published (see, e.g., Buckel, P. et al. (1987) Gene 51:13-19; 
Recinos A 3 rd et aL (1994) Gene 149: 385-386; (1995) Gene Jun 9;158(2):311-2; 
andRecinos A3 rf et al. (1994) Gene Nov 18;149(2):385-6). Additional protocols 
10 for the cloning and generation of antibody fragment and antibody expression 
constructs have been described in Antibody Engineering, R Kontermann and S. 
Dubel (2001), Editors, Springer Verlag: Berlin Heidelberg New York. 
[0209] Fungal expression plasmids encoding heavy and light chain of 
immunoglobulins have been described (see, e.g., Abdel-Salam, H.A et al. (2001) 
15 Appl. Microbiol. Biotechnol. 56: 157-164; and Ogunjimi, A.A et al. (1999) 

Biotechnology Letters 21: 561-567). One can thus generate expression plasmids 
harboring the constant regions of unmunoglobulins. To facilitate the cloning of 
variable regions into these expression vectors, suitable restriction sites can be 
placed in close proximity to the termini of the variable regions. The constant 
20 regions can be constructed in such a way that the variable regions can be easily in- 
frame fused to them by a simple restriction-digest / ligation experiment. Figure 31 
shows a schematic overview of such an expression construct, designed in a very 
modular way, allowing easy exchange of promoters, transcriptional terminators, 
integration targeting domains and even selection markers. 
25 [0210] As shown in Figure 31, V L as well as V H domains of choice can be easily 
cloned in-frame with C L and the C H regions, respectively. Initial integration is 
targeted to the P. pastoris AOX locus (or homologous locus in another fungal cell) 
and the methanol-inducible AOX promoter will drive expression Alternatively, 
any other desired constitutive or inducible promoter cassette may be used. Thus, if 
30 desired, the 5'AOX and 3 AOX regions as well as transcriptional terminator (TT) 
fragments can be easily replaced with different TT, promoter and integration 
targeting domains to optimize expression. Initially the alpha-factor secretion 
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signal with the standard KEX protease site is employed to facilitate secretion of 
heavy and light chains. The properties of the expression vector may be further 
refined using standard techniques. 

[0211] An Ig expression vector such as the one described above is introduced 
5 into a host cell of the invention that expresses GnTm, preferably in the Golgi 

apparatus of the host cell. The Ig molecules expressed in such a host cell comprise 
N-glycans having bisecting GlcNAcs. 

EXAMPLE 7 

Cloning and expression of GnT-IV (PDP-GIcNAc:alpha-l,3-D -mannoside 
10 beta-l,4-N-Acetylglucosaminyltransferase IV) arid 

GnT-V (beta 1-6-N-acetylglucosaminyltransferase) 

[0212] GnTIV-encoding cDNAs were isolated from bovine and human cells 
(Mmowa,M.T. et al. (1998)/. Biol Chem. 273 (19), 11556-11562; and 

15 Yoshida,A. et al. (1999) Glycobiology 9 (3), 303-310. The DNA fragments 

encoding full length and a part of the human GnT-IV protein (Figure 33) lacking 

the TM domain are PCR amplified from the cDNA library using forward 

5'-AATGAGATGAGGCTCCGCAATGGAACTG-3\ 

5 9 -CTGATTGCTTATC AACGAGAATTCCTTG-3 9 , and reverse 

20 5 '-TGTTGGTTTCTCAGATGATCAGTTGGTG-3 'primers, respectively. 
The resulting PCR products are cloned and sequenced. 

[0213] Similarly, genes encoding GnT-V protein have been isolated from several 
mammalian species, including mouse. (See, e.g., Alverez, K. et al. Glycobiology 
12 (7), 389-394 (2002)). The DNA fragments encoding full length and a part of 

25 the mouse GnT-V protein (Figure 34) lacking the TM domain are PCR amplified 
from the cDNA library using forward 5 
AGAGAGAGATGGCTTTCTTITCTCCCTGG-3 5'- 
AAATCAAGTGGATGAAGGACATGTGGC-3 and reverse 
5'-AGCGATGCTATAGGCAGTCTTTGCAGAG-3 'primers, respectively. The 

30 resulting PCR products are cloned and sequenced. 

[0214] Nucleic acid fragments comprising sequences encoding GnT IV or V (or 
catalytically active fragments thereof) are cloned into appropriate expression 
vectors for expression, and preferably targeted expression, of these activities in an 
appropriate host cell according to the methods set forth herein. 
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What is Claimed is : 

1 . A method for producing a human-like glycoprotein in a non-human 
eukaryotic host cell comprising the step of diminishing or depleting the activity of 
one or more enzymes in the host cell that transfers a sugar residue to the 1,6 arm of 

5 a lipid-linked oligosaccharide structure. 

2. The method of claim 1, further comprising the step of introducing into the 
host cell at least one glycosidase activity. 

3 . The method of claim 2, wherein at least one glycosidase activity is a 
mannosidase activity. 

10 4. The method of claim 1 , further comprising producing an N-glycan. 

5. The method of claim 4, wherein the N-glycan has a GlcNAcMan x GlcNAc 2 
structure wherein X is 3, 4 or 5. 

6. The method of claim 5, further comprising the step of expressing within the 
host cell one or more enzyme activities, selected from glycosidase and 

15 glycosyltransferase activities, to produce a GlcNAc2Man3GlcNAc 2 structure. 

7. The method of claim 6, wherein the activity is selected from a-1,2 
mannosidase, a- 1,3 mannosidase and GnTII activities. 

8. The method of claim 1, wherein at least one diminished or depleted enzyme 
is selected from the group consisting of an enzyme having dolichyl-P- 

20 Man:Man 5 GlcNAc 2 -PP-dolichyl alpha- 1,3 mannosyltransferase activity; an 
enzyme having dohchyl-P-Man:Man6GlcNAc2-PP-dohchyl alpha-1,2 
mannosyltransferase activity and an enzyme having dolichyl-P- 
Man:Man7GlcNAc 2 -PP-dolichyl alpha- 1,6 mannosyltransferase activity. 
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9. The method of claim 1, wherein the diminished or depleted enzyme has 
25 doUchyl-P-Maa:Man 5 GlcNAc 2 -PP-dohchyl alpha-1,3 mannosyltransferase 

activity. 

10. The method of claim 1, wherein the enzyme is diminished or depleted by 
mutation of a host cell gene encoding the enzymatic activity. 

11. The method of claim 10, wherein the mutation is a partial or total deletion 
30 of a host cell gene encoding the enzymatic activity. 

12. The method of claim 1, wherein the glycoprotein comprises tf-glycans 
having seven or fewer mannose residues. 

13 . The method of claim 1, wherein the glycoprotein comprises tf-glycans 
having three or fewer mannose residues. 

35 - 14. The method of claim 1, wherein the glycoprotein comprises one or more 
sugars selected from the group consisting of galactose, GlcNAc, sialic acid, and 
fucose. 

15. The method of claim 1, wherein the glycoprotein comprises at least one 
ohgosaccharide branch comprising the structure NeuNAc-Gal-GlcNAc-Man. 

40 1 6. The method of claim 1 , wherein the host is a lower eukaryotic cell. 

17. The method of claim 1, wherein the host cell is selected from the group 
consisting of Pichia pastoris, Pichia finlandica, Pichia trehalophila, Pichia 
koclamae, Pichia membranaefaciens, Pichia opuntiae, Pichia thermotolerans, 
Pichia salictaria, Pichia guercmmi, Pichia pijperi, Pichia stiptis, Pichia 

45 methanolica, Pichia sp. t Saccharomyces cerevisiae, Saccharomyces sp., Hansenula 
polymorpha, Kluyveromyces sp., Candida albicans, Aspergillus nidulans, 
Aspergillus niger, Aspergillus oryzae, Trichodeiina reesei, Chrysosporium 
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lucknowense, Fusarium sp., Fusarium gramineum, Fusarium venenatum and 
Neurospora crassa. 

50 18. The method of claim 1 , wherein the host cell is further deficient in 
expression of initiating or-1,6 mannosyltransferase activity. 

19. The method of claim 1 8, wherein the host cell is an OCH1 mutant of P. 
pastoris. 

20. The method of claim 1, wherein the host cell expresses GnTI and UDP- 
55 GlcNAc transporter activities. 

2 1 . The method of claim 1 , wherein the host cell expresses a UDP- or GDP- 
specific diphosphatase activity. 

22. The method of claim 1 , further comprising the step of isolating the 
glycoprotein from the host 

60 23. The method of claim 22, further comprising the step of subjecting the 
isolated glycoprotein to at least one further glycosylation reaction in vitro, 
subsequent to its isolation from the host 

24. The method of claim 1, further comprising the step of introducing into the 
host a nucleic acid molecule encoding one or more enzymes involved in the 

65 production of GlcNAcMan 3 GlcNAc 2 or GlcNAc 2 Man 3 GlcNAc 2 . 

25. The method of claim 24, wherein at least one of the enzymes has 
mannosidase activity. 

26. The method of claim 25, wherein the enzyme has an ct-1 ,2-mannosidase 
activity and is derived from mouse, human, Lepidoptera, Aspergillus nidulans, C 

70 elegans, D, melanogaster, or Bacillus sp. 
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27. The method of claim 25, wherein the enzyme has an a-l,3-mannosidase 
activity. 

28. The method of claim 24, wherein at least one enzyme has 
glycosyltransferase activity. 

75 29. The method of claim 28, wherein Ihe glycosyltransferase activity is selected 
from the group consisting of GnTT and GnTTL 

30. The method of claim 24, wherein at least one enzyme is localized hy 
forming a fusion protein between a catalytic domain of the enzyme and a cellular 
targeting signal peptide. 
80 31. The method of claim 30, wherein the fusion protein is encoded by at least 
one genetic construct formed by the in-frame ligation of a DNA fragment encoding 
a cellular targeting signal peptide with a DNA fragment encoding a glycosylate 
enzyme or catalytically active fragment thereof. 

32. The method of claim 31, wherein the encoded targeting signal peptide is 
85 derived from a member of the group consisting of mannosyltransferases, 

diphosphotases, proteases, GnT I, GnT II, GnT TH, GnT IV, GnT V, GnT VI, 
GalT, FT, and ST. 

33. The method of claim 3 1, wherein the catalytic domain encodes a 
glycosidase or glycosyltransferase that is derived from a member of the group 

90 consisting of GnT I, GnT H, GnT HI, GnT IV, GnT V, GnT VI, GalT, 

Fucosyltransferase and ST, and wherein the catalytic domain has a pH optimum 
within 1 .4 pH units of the average pH optimum of other representative enzymes in 
the organelle in which the enzyme is localized, or has optimal activity at a pH 
between 5.1 and 8.0. 



75 



WO 03/056914 



PCT/US02/41510 



95 34. The method of claim 31, wherein the nucleic acid molecule encodes one or 
more enzymes selected from the group consisting of UDP-GlcNAc transferase, 
UDP-galactosyltransferase, GDP-fiicosyltransferase, CMP-sialyltransferase, UDP- 
GlcNAc transporter, UDP-galactose transporter, GDP-fucose transporter, CMP- 
sialic acid transporter, and nucleotide diphosphatases. 

100 35. The method of claim 31, wherein the host expresses GnTI and UDP- 
GlcNAc transporter activities. 

36. The method of claim 3 1 , wherein the host expresses a UDP- or GDP- 
specific diphosphatase activity. 

37. The method of claim 1, further comprising the step of introducing into a 
105 host that is deficient in dohchyl-P-Man:Man5GlcNAc2-PP-dohchyl alpha-1,3 

mannosyltransferase activity a nucleic acid molecule encoding one or more 
enzymes for production of a GlcNAcMan4GlcNAc 2 carbohydrate structure. 

38. The method of claim 1, further comprising the step of introducing into a 
host that is deficient in dohchyl-P-Man:Man6GlcNAc2-PP-dohchyl alpha-1,2 

110 marmosyltransferase or dohchyl-P-Man:Man7GlcNAc2-PP-doKchyl alpha-1,6 
mannosyltransferase activity a nucleic acid molecule encoding one or more 
enzymes for production of a GlcNAcMan4GlcNAc 2 carbohydrate structure. 

39. The method of claim 37 or 3 8, wherein the nucleic acid molecule encodes 
at least one enzyme selected from the group consisting of an a- 1,2 mannosidase, 

1 1 5 UDP GlcNAc transporter and GnTI . 

40. The method of claim 39, further comprising the step of introducing into the 
deficient host cell a nucleic acid molecule encoding an c*-l,3 or an a-l,2/a-l,3 
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mannosidase activity for the conversion of the GlcNAciMan4GlcNAc 2 structure to 
a GlcNAciMan 3 GlcNAc 2 structure. 
120 41. The method of claim 1 , further comprising the step of introducing into the 
host a nucleic acid molecule encoding one or more enzymes for production of a 
GlcNAc 2 Man 3 GlcNAc2 carbohydrate structure. 

42. The method of claim 41, wherein at least one enzyme is GnTTL 

43. The method of claim 1, further comprising the step of introducing into the 
125 host cell at least one nucleic acid molecule encoding at least one mammalian 

giycosyiation enzyme selected from the group consisting of a glycosyltransferase, 
fucosyltransferase, glactosyltransferase, N-acetylgalactosaminyltransferase, N- 
acetylglycosaminyltransferase and sulfotransferase. 

44. The method of claim 1, comprising the step of transforming host cells with 
130 a DNA library to produce a genetically mixed cell population expressing at least 

one giycosyiation enzyme derived from the library, wherein the library comprises 
at least two different genetic constructs, at least one of which comprises a DNA 
fragment encoding a cellular targeting signal peptide ligated in-frame with a DNA 
fragment encoding a giycosyiation enzyme or catalytically active fragment thereof. 
135 45. A host cell produced by the method of claim 1 or 44. 

46. A human-like glycoprotein produced by the method of claim 1 or 44. 

47. A nucleic acid molecule comprising or consisting of at least forty-five 
consecutive nucleotide residues of Fig. 6 (P. pastoris ALG 3 gene). 

48. A vector comprising a nucleic acid molecule of claim 47. 
140 49. A host cell comprising a nucleic acid molecule of claim 47. 
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50. A P.pastoris cell in which the sequences of Fig. 6 (P. pastoris ALG 3 
gene), are mutated whereby the glycosylation pattern of the cell is altered. 

51. A method to enhance the degree of glucosylation of lipid-linked 
oligosaccharides comprising the step of increasing alpha- 1,3 glucosyltransf erase 

145 activity in a host cell. 

52. A method to enhance the degree of glucosylation of lipid-linked 
oligosaccharides comprising decreasing the substrate specificity of oligosaccharyl 
transferase activity in a host cell. 

53 . A method for producing in a non-mammalian host cell an immunoglobulin 
150 polypeptide having an N-glycan comprising a bisecting GlcNAc, the method 

comprising the step of expressing in the host cell a GnTHI activity. 

54. A non-mammalian host cell that produces an immunoglobulin having an N- 
glycan comprising a bisecting GlcNAc. 

55. An immunoglobulin produced by the host cell of claim 54. 

155 56. A method for producing in a non-human host cell a polypeptide having an 
N-glycan comprising a bisecting GlcNAc, the method comprising the step of 
expressing in the host cell a GnTIH activity. 

57. A non-human host cell that produces a polypeptide having an N-glycan 
comprising a bisecting GlcNAc. 

160 58. A polypeptide produced by the host cell of claim 57. 

59. A method for producing a human-like glycoprotein in a non-human 
eukaryotic host cell comprising the step of dimini shing or depleting from the host 
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cell an alg gene activity and introducing into the host cell at least one glycosidase 
activity. 

1 65 60. A method for producing a human-like glycoprotein having an N-glycan 
comprising at least two GlcNAcs attached to a trimannose core. 
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Lipid-linked N-glycans 
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ALG3 Blast 05-22-01 

Sequences producing significant alignments: (bits) Value 



9i 

gi 

Si 

gi 
gi 
gi 
gi 



.797 0.0 



586444 I Sp) P38179 |ALG3 YEAST DOLICHYL- P-MAN : MAN ( 5 ) GLCNAC ( - 

ALG3_HUMAN DOLICHYL-P-MAN :MAN (5) GLCNAC. . . 173 7e-43 

NT56_DROVI LETHAL ( 2 ) NEIGHBOUR OF TID P. . .145 3e-34 

5U « ^ sp ^ NT56_DROME LETHAL (2) NEIGHBOUR OF TID P...121 3e-27 

10720153 1 spl PB2149 |NT53_DROME LETHAL (2) NEIGHBOUR OF TID ...121 5e-27 

1707982 lsp|P40989|GLS2_YEAST 1 , 3-BETA-GLUCAN SYNTHASE CO... 32 2.8 

1346146 sp P3B631|GLS1 YEAST 1 , 3 -BETA-GLUCAN SYNTHASE CO... 31 6.6 



3024226 
3024221 
3024222 



Q92685 
Q24332 
Q27333 



Alignments 

Yeast 

>gi 1 586444 1 sp I P38179 1 ALG3_YEAST DOLICHYL-P- 
Tv>TAjj.A^j^aLr!NAnr2VPP.DOLICHYL MANNOSYLTRANSFERASE 

"~ (DOL-P-MAN DEPENDENT ALPHA (1-3) -MANNOSYLTRANSFERASE) 

(HM-1 KILLER TOXIN RESISTANCE PROTEIN) 
Length = 458 

Score = 797 bits (2059), Expect =0.0 

Identities = 422/458 (92%) ,- Positives = 422/458 (92%) 

Query- 1 MEGEQSPQGEKSLQRKQFVRPPLDLWQDLKDGVRYVI FDCRANLI VMPLLILFESMLCKI 60 
MwrTrnc ptmteK'S T .ORKO FVRP P LDLWODLKDGVRYVI FDCRANL IVMPLL I LFESMLCKI 



Sbjct: 1 MEGEQSPQGEKSLQRKQFVRPPLDLWQDLKDOVKX V 1 tijuuvsuj. vnir.uijj.ur 60 

Query: 61 1 1 KKVAYTE IDYKAYMEQI EMI QLDGMLD YSQVS GGTGPLVYPAGHVLI YKMMYWLTEGM 120 

1 1 KKVAYTE IDYKAYMEQIEMI QLDGMLDYSQVSGGTGPLVYPAGHVLI YKMMYWLTEGM 
Sbjct: 61 1 1 KKVAYTE IDYKAYMEQI EMI QLDGMLD YS QVSGGTGPLVYPAGHVL I YKMMYWLTEGM 120 

Query: 121 DHVERGQVFFRYLYLLTLALQMACYY 180 

DHVERGQ VF FRYL YLL TIJUjQMAC YYLLHLP PW CWLACL S KRIiH S I YVLRLFNDC FTTL 
Sbjct: 121 DHVERGOVFFRYLYLLTLALQMACYYLLHLPPW 180 

Query- 181 FMVVTVI/1AI VASRCHQRPKLKKS LALVI S ATYSMAVS I KMNALLYF PAMMI S LFI LNDA 240 

FMVVTVLGAI VASRCHQRP KLKKSLALVI S ATYSMAVS I KMNAJJLYFPAMMI SLF ILNDA 
Sbjct: 181 FMVVTVLGAI VASRCTQRPKLKKSLALVIS 240 

Query- 241 NVI LTLLDLVAMI AWQVAVAVPFLRS FPQQYLHCAFNFGRKFMYQWS INWQMMDEEAFND 300 

NVILTLLDLVAMIAWQVAVAVP FLRS FPQQYLHCAFNFGRKFMYQWS INWQMMDEEAFND 
Sbjct: 241 NVILTLLDLVAMIAWQVAVAVP FLRS FPQQYLHCAFNFGRKFMYQWS INWQMMDEEAFND 300 

Query- 301 KRFXXXXXXXXXXXXXXX^ 360 

FVTRYPRILPDLWSSLCKPLRKNAVLNANPAKTIPFVLIASN 

Sbjct: 301 KRFHIiALLI SHLIALTTLFVTRYPRI LPDLWS SLCHPLRKNAVLNANPAKTI P FVL IASN 360 

Query: 361 FIGVLFSRSLHYQFLSWYHWTLPILIFWSGMPFFVGPIWYVLHEWCWNSYPPNSQXXXXX 420 

FIGVLFSRSLHYQFLSWYHWTLPILIFWSGMPFFVGPIWYVLHEWCWNSYPPNSQ 
Sbjct: 361 FIGVLFSRSLHYQFLSWYHWTLP ILI FWSGMP FFVGP IWYVIjHEWCWNS YPPNSQASTLL 420 

Query: 421 XXXXXXXXXXXXXXXXSGSVALAKSHLRTTSSMEKKLN 458 

SGSVALAKSHLRTTSSMEKKLN 
Sbjct: 421 LAIJ3TVLLLLLALTQLSGSVALAKSHLR 458 
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Human 

>gi|3024226|sp|Q926B5|ALG3„HUMAN DOLICBYL- P-MAN : MAN ( 5 ) GLCNAC { 2 ) -PP--DOLICHYL 

MANNOSYLTRANSFERASE 

(DOL- P-MAN DEPENDENT ALPHA ( 1 - 3 ) - MANNOS YLTRANSFERASE ) 

(NOT56-LIKE PROTEIN) 

Length =438 

Score = 173 bits (439), Expect = 7e-43 no/OGC 
Identities = 133/396 (33%) , Positives = 195/396 (48%) , Gaps = 28/396 (7%) 

Ouerv- 26 WQDLKDGVRYVT FDCRANL IVMPLLI LFE SMLCKI 1 1 KKVAYTEIDYKAYMEQ I EMI QLD 85 

1 * WQ+ R ++ + R L+V L L E + +1 +VAYTE I D + KAYM ++E + ++ 

Sbjct: 29 WQER RLLi^PRYTLLVAACLCXAETO^ 83 

Query- 86 GMLDYSQVSGGTGPLVYPAGHVLIYKMMYWLTEGMDH^ 145 

G DY+Q+ G TGPLVYPAG V 1+ +Y+ T + Q P LYL TL L Y 

Sbjct: 84 GTYDYTQLQGDTGPLVYPAGFVYIFMGLYYATSRGTDI 143 

Query: 146 Y - LLHLP PWC - WLACLS KRLHS I YVLRLFNDCFTTLFMVVTVLGAIVASRCHQRP KLKK 203 

" ^ + +pp+ + C S R+HS I +VLRLFND + + +L + QR — 

Sbjct: 144 HQTCKVPPFVFFFMCCAS YRVHS I FVLRLFNDP VAMVLLFLSINLLLAQRWGWG- 197 

Query 204 S LALV I S ATYSMAVS I KMNALL YFPAMMI S LFI LNDANVI LTLLDLVAMI AWQVAVAVP F 253 

+S+AVS+KMN LL+ P ++ L ■ L L + A + QV + +PF 

Sbjct- 198 - cCFFSLAVSVKMNVLLFAPGLLFLIjLTQFGFRGALPKI^ 249 



Query : 264 . _ 

L P YL +F+ GR+F++ W++NW+ + E F + F t ~t 

Sbjct: 250 LLENPSGYLSRSFDLGRQFLFHWTVNWRFI^ 309 

Ouerv- 324 PRILPDLWSSIX^PIJlKNAvLNAOT 383 

R +SLP++ IL SNFIG+ FSRSLHYQF WY TLP 

Sbjct: 310 HRTGESILSLLRDPSKRKVPPQPLTPNQIVSTLFTSN^^ 369 

Query: 384 ILIF WSGMPFFVGPIWYVLHEWCWNSYPFNS 414 

L++ V? + + + E WN+YP S 

Sbjct: 370 YLLWAMPARWLTHLLRLLVLGLI - - ELSWNTYPSTS 403 

Drosophila Vi 

>gi|302422l|sp|Q24332|NT56_DROVI LETHAL ( 2 ) NEIGHBOUR OF TID PROTEIN (NOTSB) 
Length = 526 

Score = 145 bits (366) , Expect » 3e-34 

Identities = 103/273 (37%), Positives = 157/273 (56%), Gaps = 17/273 (6%) 

Query: 33 VRYVTFDCRANLIVMPLLILFESMLCKIIIKXVAYTEIDY 92 

++ Y+ F+ A IV L++L E+++ ++I++V YTEID+KAYM++ E L+G +YS 
Sbjct: 34 I KYLAFE PAALP I VS VL IVLAEAVINVLVI QRVP YTE IDWKAYMQE CEGF - LNGTTNYSL 92 

Query: 93 VSGGTGPLVYPAGHVLI YKMMYWLTE<3<DHVERGQWFRYLYLLT^ - LP 151 

+ G TGPLVYPA V IY +Y+LT +V Q F +YLL + L + Y +P 
Sbjct: 93 LRGDTGPLVYPAAFVYIYSGLYYLTGQGTNvRLAQY^ 152 

Query: 152 PWCWIAO>SKRLHSIYVLRLFNDCFTT^ 210 

p+ +VL+ S R+HSIYVLRLFND L +L A + QR L S 
Sbjct: 153 PYVLVLSAFTSYRIHSIYVLRLFNDPVAIL LLYAALNLFLDQRWTLG S 200 

Query: 211 ATYSMAVS I KMNALLYFPAMMISLFI LNDANVT LTLLDLVAMI AWQVAVAVPFLRSFPQQ 270 

YS+AV +KMN + A + LF L + V+ TL+ L Q+ + PFLR+ P + 

Sbjct- 201 I CYSLAVGVKMN - - ILLFAPALLLFYLANLGVLRTLVQLTI CAVLQLFI GAPFLRTHPME 258 
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Query: 271 YLHCAFNFGRKFMYQWS INWQMMDEEAFNDXRF 303 

YIi +F+ GR F ++W++N++ + +E F + F 
SbjCt: 259 YLRGS FDLGRI FEHKWTVNYRFIiS KEIiFEQRE F 291 

Score =53.3 bits (127), Expect = le-06 

Identities = 31/62 (50%), Positives = 41/62 (66%), Gaps = 6/62 (9*) 

Query: 352 iPFVIiIASNFIGVLFSRSLHYQ^ 409 

+PF L NFIGV +RSLHYQF WY +LP Lr+ WS P+ +G + +L E+CWN+ 
Sbjct: 412 LPFFL--(OTIGVA<^RSIJnrQFYIWYFHSL^^ 467 

Query: 410 YP 411 
YP 

Sbjct: 468 YP 469 



Drosophila melanogaster 

>gi (3024222 | sp | Q27333 | KT56_DROME DETHAL (2 )NEIGHBOUR OF TID PROTEIN (NOT56) 
(NOT45) 

Length - 510 
Score = 121 bits (305), Expect » 3e-27 

Identities = 96/272 (35%), Positives = 154/272 (56%), Gaps = 17/272 (6%) 

Query. 34 RYVTFDCRANLIVMPLLI LFESMLCKI I IKKVAYTEIDYKAYMEQI EWIQIjDGMIiDYSQV 93 

+Y++ + A IV ++L E ++ ++I++V YTEID+ AYM++ E I*+G +YS + 
SbjCt: 36 KYLLLEPAALPIVGLFVLLAEHjVIirVVV^ 94 

Query: 94 SGGTGPL VYPAGHVLI YKMMYWLTE GMDHVT^GQVFFRYL YLLTLAIjQMACYYIjIjH - LP P 152 

G TGPLVYPA V IY +Y++T +V Q F +YLL LAL + Y +PP 
Sbjct: 95 RGDTGPLVYPAAF\TYI YSAL YYVTSHGTNVRIiAQYI FAGI YliLQIiALVLRLYSKSRKVPP 154 

Query: 153 wCVVLACL-SKiaiHSIYVUUiFlTC 211 

+ +VL+ S R+HS I YVLRIiFND + V +L A + +R h S 
Sbjct: 155 YVLVIiSAFTSYRIHSIYVIiRIjFNDP VAVLLLYAALNI1FL1DRRWTLG ST 202 

Query: 212 TY SMAVS I KMNALL YF P AMMI S LF I LNDANV I LTLLDLVAMI AWQ VAVAVP FLRS F P QQ Y 271 

+S+AV +KMN + A + LF I» + ++ T+L L Q+ + PFL + P +Y 

Sbjct: 203 FFSLAVGVKMN - - 1 LL FAPALLIiFYIiANLGLLRTI LQLAVCGVT QLIiLGAP FLLTHFVEY 260 

Query: 272 LHCAFNFGRKFMYQWS INWQMMDEEAFNDKRF 303 

L + F+ GR F ++W++N++ + + F ++ F 
Sbjct: 261 IiRGS FDIjGRI FEHKWTVNYRFLSRDVFENRTF 292 

Score b 49.4 bits (117), Expect = 2e-05 

Identities = 27/60 (45%), Positives ■ 35/60 (58%), Gaps « 2/60 (3%) 

Query: 352 IPFVIjIASNFIGVLFSRSLHYQFI>SWYHWTLPILIFWSG^ 411 

+PF L N +GV SRSLHYQF WY +LP L + + V + L E+CWN+YP 

Sbjct: 407 LPFFL - - onjVGVACSRSLHYQFYVWYFHSLPYIAWSTPYSLGTO 464 
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Matrix: BLOSUM62 

Gap Penalties: Existence: 11, Extension: 1 

Number of Hits to DB: 28B83317 

Number of Sequences: 96469 

Number of extensions: 1107545 

Number of successful extensions: 2870 

Number of sequences better than 10.0: 16 

Number of HSP's better than 10.0 without gapping: 5 

Number of HSP's successfully gapped in prelim test: 11 

Number of HSP's that attempted gapping in prelim test: 2839 

Number of HSP's gapped (non-prelim): 23 

length of query: 458 
length of database: 35,174,128 
effective HSP length: 45 
effective length of query: .413 
effective length of database: 30,833,023 
effective search space: 12734038499 
effective search space used: 12734038499 
T: 11 
A: 40 

XI: 15 ( 7.1 bits) 
X2: 38 (14.6 bits) 
X3: 64 (24.7 bits) 
SI: 40 (21.8 bits) 
S2: 67 (30.4 bits) 
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FIGURES 

S. cerevisiae ALG3 , , A 

ATGGAAGGTGAACAGTCTCCGCAAGGTGAAAAGTCTCTGCAAAGGAAGC 

AATTTGTCAGACCTCCGCTGGATCTGTGGCAGGATCTCAAGGACGGTGTG 

CGCTACGTGATCTTCGATTGTAGGGCCAATCTTATCGTTATGCCCCTTTTG 

ATTTTGTTCGAAAGCATGCTGTGCAAGATTATCATTAAGAAGGTAGCTTAC 

ACAGAGATCGATTACAAGGCGTACATGGAGCAGATCGAGATGATTCAGCT 

CGATGGCATGCTGGACTACTCTCAGGTGAGTGGTGGAACGGGCCCGCTGG 

TGTATCCAGCAGGCCACGTCTTGATCTACAAGATGATGTACTGGCTAACA 

GAGGGAATGGACCACGTTGAGCGCGGGCAAGTGTTTTTCAGATACTTGTA 

TCTCCrrACACTGGCGTTACAAATGGCGTGTTACTACCTTTTACATCTACC 

ACCGTGGTGTGTGGTCTTGGCGTGCCTCTCTAAAAGATTGCACTCTATTTA 

CGTGCTACGGTTATTCAATGATTGCTTCACTACTTTGTTTATGGTCGTCACG 

GTTTTGGGGGCTATCGTGGCCAGCAGGTGCCATCAGCGCCCCAAA.TTAAA. 

GAAGTCCCTTGCGCTGGTGATCTCCGCAACATACAGTATGGCTGTGAGCA 

TTAAGATGAATGCGCTGTTGTATTTCCCTGCAATGATGATTTCTCTATTCAT 

CCTTAATGACGCGAACGTAATCCITACTTOTOGATCT^ 

TGCATGGCAAGTCGCAGTTGCAGTCjOCCl lCulGCuCAGCi i iCCGCAACA 

GTACCTGCATTGCGCTTTTAATTTCGGCAGGAAGTTTATGTACCAATGGAG 

TATCAATTGGCAAATGATGGATGAAGAGGCTTTCAATGATAAGAGGTTCC 

ACTrGGCCCTTTTAATCAGCCACCTGATAGCGCTCACCACACTGTTCGTCA 

CAAGATACCCTCGCATCCTGCCCGATTTATGGTCTTCCGTGTGCCATCCGC 

TGAGGAAAAATGCAGTGCTCAATGCCAATCCCGCCAAGACTATTCCATTC 

GTTCTAATCGCATCCAACTTCATCGGCGTCCTATTTTCAAGGTCCCTCCAC 

TACCAGTTrCTATCCTGGTATCACTGGACTTTGCCTATACTGATCTTTTGGT 

CGGGAATGCCCTTCTTCGTTGGTCCCATTTGGTACGTCTTGCACGAGTGGT 

GCTGGAATTCCTATCCACCAAACTCACAAGCAAGCACGCTATTGTTGGCA 

TTGAATACTGTTCTGTTGCTTCTATTGGCCTTGACGCAGCTATCTGGTTCGG 

TCGCCCTCGCCAAAAGCCATCTTCGTACCACCAGCTCTATGGAAAAAAAG 

CTCAACTGA 



S. cerevisiae Alg3p 

MEGEQSPQGEKSI^RKQFVItfPIX>LWQDLKDGV^ 

FESMLCXIOKKVAYTEIDYKAYMEQffiMIQLDGMLDYSQVSGGTGPLVYPAG 

HVLIYKMNfYwXTEGMDHVERGQWFRYLYLLTLALQMACYYLLHIPPW 

VIACI^KPJJiSIYVLRLFNDOETTIJMVW 

ISATYSMAVSIKMNALLYFPAMMISLFILNDANVILTLIJ)LVAMIAWQVAVA 
WFIJElSFPQQYIiiCAIOTGRKFMY QWSINWQMKfflEEAFNDKIU'HIALLISHL 
LALTTLFVTRYPRE^PDLWSSLCHPLRKNAVLNANPAKTff 
RSUTYQFI^WYHWTLPILffWSGMPFF^GPIWYVLHEWCWNSYPPNSQASTL 

LLALNTVLLLLLALTQI^GSVALAKSHUITTSSMEKKLN 
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FIGURE 6 

P. pastoris ALG3 „ 
ATGCCTCCGATAGAGCCAGCTGAAAGGCCAAAGCTTACGCTGAAAAATGT 

TATCGGTGATCTAGTGGCTCTTATTCAAAACGTTTTATTTAACCCAGATTTT 

AGTGTCTTCGTTGCACCTCTTTTATGGTTAGCTGATTCCATTGTTATCAAGG 

TGATCATTGGCACTGTTTCCTACACAGATATTGATTTTTCTTCATATATGCA 

ACAAATCTTTAAAATTCGACAAGGAGAATTAGATTATAGCAACATATTTG 

GTGACACCGGTCCATTGGTTTACCCAGCCGGCCATGTTCATGCTTACTCAG 

TACTTTCGTGGTACAGTGATGGTGGAGAAGACGTCAGTTTCGTTCAACAA 

GCATTTGGTTGGTTATACCTAGGTTGCTTGTTACTATCCATCAGCTCCTACT 

riTrCTCTGGCTTAGGGAAAATACCTCCGGriTATTTTGTTTTGTTGGTAGC 

GTCCAAGAGACTGCATTCAATATTTGTATTGAGACTCTTCAATGACTGTTT 

AACAACATTTTTGATGTTGGCAACTATAATCATCCTTCAACAAGCAAGTAG 

CTGGAGGAAAGATGGCACAACTATTCCATTATCTGTCCCTGATGCTQGAG 

ATACGTACAGTTTAGCCATCTCTGTAAAGATGAATGCGCTGCTATACCTCC 

CAGCATTCCTACTACTCATATATCTCATTTGTGACGAAAATTTGATTAAAG 

CCTTGGCACCTGTTCTAGTTTTGATATTGGTGCAAGTAGGAGTCGGTTATT 

CGTTCATTTTACCGTTGCACTATGATGATCAGGCAAATGAAATTCGTTCTG 

CCTACTTTAGACAGGCTTTTGACTTTAGTCGCCAATTTCTTTATAAGTGGA 

CGGTTAATTGGCGCTTTTTGAGCCAAGAAACTTTCAACAATGTCCATTTTC 

ACCAGCTCCTGTTTGCTCTCCATATTATTACGTTAGTCTTGTTCATCCTCAA 

GTTCCTCTCTCCTAAAAACATTGGAAAACCGCTTGGTAGATTTGTGTTGGA 

CATTTTCAAATTTTGGAAGCCAACCTTATCTCCAACCAATATTATCAACGA 

CCCAGAAAGAAGCCCAGATTTTGTTTACACCGTCATGGCTACTACCAACTT 

AATAGGGGTGCTTTTTGCAAGATCTTTACACTACCAGTTCCTAAGCTGGTA 

TGCGTTCTCTTTGCCATATCTCCTTTACAAGGCTCGTCTGAACTTTATAGCA 

TCTATTATTGTTTATGCCGCTCACGAGTATTGCTGGTTGGTTTTCCCAGCTA 

CAGAACAAAGTTCCGCGTTGTTGGTATCTATCTTACTACTTATCCTGATTC 

TCATTTTTACCAACGAACAGTTATTTCCTTCTCAATCGGTCCCTGCAGAAA 

AAAAGAATACATAA 



P. pastoris Alg3p 

MPPffiPAEI^KLlXKNVIGDLVALIQNVLFNPDFSVFVAPLLWLADSIVIKVnG 
WSYTOIDFSSYMQQIFKmQGELDYSNlFGDTGPLVYPAGHVHAYSVLSWYS 
DGGEDVSFVQQAFGWXYLGCLLI^ISSYFTSGIXjKIPPVYFVLLVASKRLHSIF 
VUttiTTOCLTTFLMLATIIILQQASSWRK^ 

ALLYIJAFLLLrmCDENLIKALAPVLVLILVQVGVGYSFILPLHYDDQANEIR 

SAYFRQAFDFSRQFLYKWTWWP^I^QETF^ 

LSPKMGKPLGRFVLDIFKFWKPTIi>PTN^ 

ARSLHYQFI^WAFSLPYLLYKAPXOTIASnWAAHEYCWLVFPATEQSSAL 
LVSILLLILILIFTNEQLFPSQSVPAEKKNT 
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P. pastoris ALG3 BLAST 

Sequences producing significant alignments: (bits) Value 

2e-5B 
8e-54 
4e-52 
8e-43 
2e-39 
3e-39 
2e-38 
3e-38 
2e-36 
2e-36 
3e-35 
4e-26 
3e-24 





586444 |sp|P38179|ALG3 YEAST Dolichyl-P-Man:Man (5 ) GlcNAc ( . . .228 


gi 


12802365|gb|AAK07848.l|AF309689 10 putative NOT-56 manno...212 


g* 


984725 |gb|AAA75352.l| ORF 1 206 


g 5 - 


7492702 |pir 


(T39084 probable raannosyltransf erase - f issi . . . 176 


gi 


16226531 


gt> 


AAL16193 . 1 1 AF428424 1 At2g47760/F17A22 .15 [A. . . 164 


gi 


25367230 


pir 


1B84919 Not56-li)ce protein [imported] - Ara. . .164 


g 1 


25814791 


emb 


CAB70171.2| Hypothetical protein K09E4.2 [C. . .161 


gi 


17535001 


ref 


NP 496950. lj Putative plasma membrane membr...l60 


gi 


1654000|emb 


CAA70220.1) Not56-like protein [Homo sapiens ... 155 


gi 


13279206 


gb 


AAH04313 .1|AAH04313 Unknown (protein for IMA. .. 154 


g* 


22122365 


reflKP 666051. l| hypothetical protein MGC36684 ...150 


g* 


21292031 


gb|EAA04176.l| agCP3388 [Anopheles gambiae str....!20 


gi 


17B0792|emb |CAA71167.l! lethal ( 2 ) neighbour of tid [Droso...ll4 



Alignments 
£. cerevisiae 
Score = 228 bits (580), Expect » 2e-58 

Identities = 154/429 (35%), Positives « 229/429 (53%), Gaps = 37/429 (8%) 

Query: 9 RPKLTLKNVI GDLVALI QNVLFNPDFSVFVAPLLVTLADS I VT KVI I GTVS YTDIDFSSYM 68 

RPLL DL ++ V+F+ ++ V PLL L +S++ K+II V+YT+ID+ +YM 

Sbjct: 20 RPPLDLWQ DLKDGVRYVI FDCRANLIVMPLLI LFESMLCKI 1 1 KKVAYTE I D YKAYM 76 

Query: 69 QQIFKIR- QGELDYSNI FC3DTGPLVYPAGHVHAYSVLSWYSDGGEDVS FVQQAFGWLYLG 127 

+QI 1+ G LDYS + G TGPLVYPAGHV Y ++ W ++G + V Q F +LYL 
Sbjct: 77 EQI EMI QLDGMLDYSQVS GGTGPLVYPAGHVL I YKMMYWLTEGMDHVERGQVFFRYLYLL 136 

Query: 128 CLLLS I S S YFFSGLGKI PPVYFVLLVAS KRLHS I FVLRIiFNDCLTTFXiMLATI IILQ 184 

L L ++ Y+ L +PP VL SKRLHSI+VLRLFNDC TT M+ T+ 1+ 
Sbjct: 137 TliALQMACYY LLHLPFWCVVLACLSKRIiHS I YVliRLFNTC 193 

Query: 185 QASSWRKDGTTI PliSVPDAADTYSLAI SVKMNXXXXXXXXXXXXXXXCDENLI KALAPXX 244 

+ K ++ L + + TYS+A+S+KMN D N+I h 

Sbjct: 194 RCHQRPKLKKSLALVT SATYSMAVS I KMNALLYFPAMMISLFI LNDANVILTLLDLV 250 

Query: 245 XXXXXXXXXXYSFILPLHYDDQANEIRSAYFRQAFDFSRQFLYKWTVNW 304 

F+ Y AF+F R+F+Y+W++NW+ + +E FN+ 

Sbjct: 251 AMI AWQVAVAVP FL - - --RSFPQQYLHCAFNFGRKFMYQWSINWQMMDEEAFNDK 301 

Query: 305 HFHQLLFALHIITL-VLFILKFLSPKNIGKPLGRFVIiD^ 362 

FH L H+I h LF+ ++ R + D++ L ++N +P ++ 

Sbjct: 302 RFHLALLISHLIALTTLFVTRY PRILPDLWSSLCHPLRKNAVIiNANPAKT 351 

Query: 363 PDFVYTVMATTNLIGVLFARSLHYQFLSWYAFSL^ 422 

F V+ +N IGVLF+RSLHYQFLSWY ++LP L++ + + F I Y HE+CW 
Sbjct: 352 IPF VLIASNFIGVljFSRSIiHYQFLSVrYHWTLP 408 

Query: 423 VFPATEQSS 431 

+P Q+S 
Sbjct: 409 SYPPNSQAS 417 
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Neujrospora crassa 



Score = 212 bits (540), Expect m 8e-54 

Identities = 140/400 (35%), Positives = 212/400 (53%), Gaps = 29/400 (7%) 

Query: 35 SVFVAPLLWLADSIVTKVII^ 94 

S + P Ih-L D+++ +11 V YT+ID+++YM+Q+ +1 GE DY+ + G TGPLVYP 
Sbjct: 33 SKLIP PALFLVDALLCGLI I WKVP YTE IDWAAYMEQVSQI LS GERD YTKVRGGTGPLVYP 92 

Query: 95 AGHVHAYSVLSWYSDGGEDVS FVQQAFGV^YLGCIiliLS I SS YFFSGLGKI PP VYFVLLVA 154 

A HV+ Y+ L +D G ++ QQ F LY+ L + + Y+ K PP F LL 

Sbjct: 93 AAHVYIYTGLYHLTDEGRNILLAQQL^^ QAKAPPYLFPLLTL 149 

Query: 155 SKRLHSIFVXiRLFNDCLTTFLMLATIIILQQASSWRKTC 214 

SKRLHSIFVLR FNDC + I Q+ +W+ A Y+L + VK 

Sbjct: 150 SKRLHSIFVLRCFNDCFAVLFIiWIAIFFFQR-RNWQA GALLYTLGLGVK 197 

Query: 215 MNXXXXXXXXXXXXT^ 2 74 

M ++L F+HY+Y 
Sbjct: 198 MTI^LSLPAVGIVLFLGSG- SFVTTLQLVATMGLVQILIGVPFL- -AHYPTE Y 247 

Query: 275 FRQAFD FS RQ FL YKWTVNWRFL SQETFNNVHFHQ LL FAIiHI I TLVLF I -LKFLSPKNIGK 333 

+AF+ SRQF +KWTVNWRF+ +E F + F L ALH++ L +FI +++ P K 
Sbjct: 248 LSRAFELSRQFFFKWTVNWRFVGEEIFLSKGFALTLLALHVLVIjGIFITTO - K 305 

Query: 334 PLGRFVLDIFKFWKPTLS - PTKTI INDPERSPDF\TYTVMATTNLIGVLFARSLHYQFLSWY 392 

L + + + KPL+P+ ++P++T + + N +G+LFARSLHYQF ++ 

Sbjct: 306 SLVQLISPVLLAGKPPLTVPEHRAAARDVTPRYII^ILSANAVG 355 

Query: 393 AF S L P YLL YKARLNF I AS 1 1 VYAAHE YCWLVF P ATEQ S S A 432 

A+S P+LL++A L+ + +++A HE+ W VFP+T SSA 
Sbjct: 366 AWSTPFLLWRAGLHPVLVYLLWAVHEWAWNVFPSTPASSA 405 

ScM zosacc&aromvces pombe 

i 

Score - 176 bits (445), Expect = 8e-43 

Identities = 132/390 (33%), Positives = 194/390 (49%), Gaps = 35/390 (8%) 

Query: 42 LWIADSIVTKVTilGTOSYTDIDFSSYMQQIFK^ 101 

L L + + II V YT+ID+ +YM+Q+ GE DY ++ G TGPLVYP GHV Y 

Sbjct: 30 LLLLEI PFVFAI I SKVPYTEIDWI AYl^Q VNSFLLGERDYKSLVGCTGPLVYPGGHVFLY 89 

Query: 102 S VXiSWYSDGGEDVS FVQQAFGWL YLGCLLLS I SS YFFSGLGKI PP VYFVLLVASKRLHS I 161 

++L + +DGG ++ Q F ++Y + +1 Y F + + P +VLL+ SKRLHSI 
Sbjct: 90 TLLYYLTDGGTNIVRAQYIFAFVYW- - ITTAIVGYLFK- IVRAPFYIYVLLILSKRLHSI 146 

Query: 162 FVLRLFNDCLTTFLMLATI I ILQQASSWRKDGTTI PLSVPDAADTYSLAISVKMNXXXXX 221 

F+LRLFND + L + 1+ W + A+ S+A SVKM+ 

Sbjct: 147 FILRJjFNDGFNS - LFSSLFILSSCKKKWVR ASILLSVACSVKMSSLLYV 194 

Query: 222 XXXXXXXXXXCDENLIK^^ 2 B1 

L++ L p + + + +Y+ QAFDF 

Sbjct: 195 PAYLVL LLQILGPKKTWMHIFVTIIVQILFSIPF LAYFWSYWTQAFDF 242 

Query: 282 SRQFLYKWTVNWRFLSQETFNNVHFHQLLFALHI ITLVLFILKFLSPKNTGKPLGRFVLD 341 

R F YKWTVNWRF+ + F + F + LH+ DV F K + + p 
Sbjct: 243 GRAFD YKWTVNWRF I PRS I FESTS FSTS I LFLHVALLVAFTCKHWNKLSRATP 295 

Query: 342 IFKFWKPTLSPTNIINDPERSPDFVYTVMATTNLIGVLFARSI^ 401 
F L+ + +P+F++T +AT+NLIG+L ARSLHYQF +W+A+ PYL Y 
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Sbjct: 296 - FAMVNSMLTLKPLPKLQLATPNFI FT ALATSNL I G ILCARSLHYQFYAWFAWYS P YIjCY 354 

Query: 402 KARLNF I AS 1 I VYAAHEYCWLVF PATEQS S 431 

+A I ++ EY W VFP+T+ SS 

Sbjct: 355 QASFPAPIVIGLWMLQEYAWNVFPSTKtiSS 384 
Arabidopsis thai i ana 

Score = 164 bits (415), Expect = 2e-39 - a ,->oi 
Identities « 131/391 (33%), Positives = 194/391 (49%), Gaps = 29/391 (7%) 

Ouerv- 42 IiWI»ADS I VT KVI I GTVS YTDIDFSS YMQQIFKIRQGELDYSNI FGDTGP tiVYPAGHVHAY 101 

L LAD+I++ +11 V YT ID+ +YM Q+ GE DY N + GDTGPLVYPAG ++ Y 

Sbjct: 39 L I LADAI LVAL 1 1 AYVPYTKlDWDAYMSQVSGFIiGGERD YGNIiKGDTGP LVYP AGFLYVY 98 

Ouerv- 102 SVLSWYSDGGEDVSFVQQAFGWLYLGCLLLSISSYFFSGLGKJPFVY^ 161 
W ^' 3 + + G +V Q FG LY+ L + + Y + + +P I* SKR+HSI 

Sbjct: 99 SAVQNLTGG--EVYPAQILFGVLYIVNIjGIVLIIYVKTDV- -VPWWALSLLCLiSKRXHSI 154 

Query- 162 FVLRLFNDCLTTFLMLATI 1 1 LQQASSWRKDGTTIPLSVPDAADTYSLAI SVKMNXXXXX 221 

FVLRLFNDC L+ A++ + +RK + + +S A+SVKMN 

Sbict: 155 FVLRLFNDCFAMTUjHASMALFL YRKWHLGMLV FSGAVSVKMNVLLYA 202 

Ouerv- 222 XXXXXXXXXXCDENLI 281 
UUery ' H+I ++ F ++ +Y AFD 

Sbjct: 203 PTIiLLIiliLKAM - -NIIGWSALAGAALAQILVGLPFLITYPV SYIANAFDL 251 

Query- 2B2 SRQFLYKWTVNWRFLSQETFWNVHFHQI^ 341 

R F++ W+VN++F+ + F + F L H+ LV F + K+ G +G 
Sbjct: 252 GRVFIHFWSVNFKFVPERVFVSKEF 310 

Ouerv- 342 I FKFWKP - TLSPTNI INDPERSPDFVYTVMATTNLIG 400 
W y * F p +LS +++ + + V T M N IG++FARSLHYQF SWY +SLPYLL 

Sbjct: 311 HFFLTLPSSLSFSDVSASRIITKEHVVTAMFVGOT 370 

Query: 401 YKARLNF I AS I IVYAAHE Y CWLVF P ATEQS S 431 

++ +3;++ E CW V+P+T SS 

Sbjct: 371 WRTPFPTWLRLIMFLGIELCWNVYPSTPSSS 401 



12/46 



WO 03/056914 



PCT/US02/41510 



FIGURE 8 



K. lactisALG3 

TTTGTTTACAAGCTGATACCAACGAACATGAATACACCGGCAGGTTTACT 

GAAGATTGGCAAAGCTAACCTTTTACATCCTTTTACCGATGCTGTATTCAG 

TGCGATGAGAGTAAACGCAGAACAAATTGCATACATTTTACTTGTTACCA 

ATTACATTGGAGTACTATTTGCTCGATCATTACACTACCAATTCCTATCTT 

GGTACCATTGGACGTTACCAGTACTATTGAATTGGGCCAATGTTCCGTATC 

CGCTATGTGTGCTATGGTACCTAACACATGAGTGGTGCTGGAACAGCTAT 

CCGCCAAACGCTACTGCATCCACACTGCTACACGCGTGTAACACATACTG 

TTATTGGCTGTATTCTTAAGAGGACCCGCAAACTCGAAAAGTGGTGATAA 

CGAAACAACACACGAGAAAGCTGAG 

K. lactis Alg3p 

FVYKLIPTNMNTPAGLLOGKANLLHPFIT)AWSAMRVNAEQIAm 
GVLFARSIJIYQFI^WYHWTIJPVLLNWANVPYPI^VLWYLTHEW 
NATASTLLHACNTY CYWLYSZEDPQTRKS^VITKQHTRKL 
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Score E 

Sequences producing significant alignments: (bits) Value 



Si 



Hi 



at 



ai. 



at 



at 



586444 1 Bp I P3S179 I ALG3 YEAST Dolichyl-P-Man :Man (5) GlcNAc ( . . .125 le-28 
984725 1 qb 1 AAA753S2 . 1 1 ORF 1 94 *e-19 



16226531 



25367230 



21292031 



20892051 



qblAAL16193.llAF428424 1 At2g47760/F17A22 . 15 [A. . ._72 le-12 



pirl IB84919 Not56- like protein [imported] - Ara. . ._72 le-12 



qb|EAA04176.l| agCP3388 [Anopheles gambiae str _69 2e-ll 



ref IXP 148657. ll similar to Lethal (2) neighbour . . ._65 2e-10 



Alignments 



S. cerevisiae 



score » 125 Dies \ jx* / , cxpecu = 

Identities. « 60/120 (50%), Positives = 83/120 (69%), Gaps = 1/120 (0%) 
Frame = +3 

Query: 66 ANLMPFT-DAOTSAMRVNAEQIAYIli^ 242 

++L HP +AV +A A+ I ++L+ +N+IGVLF+RSLHYQFLSWYHWTLP+L+ W+ 
Sbjct: 332 S S LCHPLRKNAVLNANP - - AKT I PFVLI ASNFI GVLFSRS LHYQFLSWYHWTLP I LI FWS 389 

Query: 243 NvWPLCVLWYLTHEWCTTOYP 422 

+p+ + +WY+ HEWCWNSYPPN+ ASTLL A NT L+ +V + KHR 

Sbjct: 390 GMP FFVGP XWYVLHEWCWNS YP PNSQASTLLLALNTVIjLLLLA- LTQLSGSVAIiAKSHLR 44 B 



A. thai i ana 
Score =72.0 bits (175), Expect = le-12 

Identities * 42/107 (39%) , Positives * 57/107 (53%) , Gaps = 3/107 (2%) 
Frame =» +3 

Query: 84 FTDAVFSAMRVNAEQI^YILLVTNYIGVLFARSIiHYQFLS 263 

F+D ' S + + E + + V N+IG++FARSLHYQF SWY ++LP LL PL 
Sbjct: 322 FSDVSASRI - ITKEHVVTAMFVGNFIGIWARSLHYQFYSWYFYSLPYLL.WRTPFPTWLR 380 

Query: 264 VLWYLTHEWCWNSYPPNATASTL LHACNTYCYWLYS*EDPQTRK 395 

++ +L E CWN YP ++S L LH WL DP K 

Sbjct: 381 LI^LGIELCWNVYPSTPSSSGLLLOiHLIILVGLWIAPSVDPYQLK 427 
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S. cerevisiae ALG9 

ATGAATTGCAAGGCGGTAACCATTAGTTrATTACTGTTGTrATTITrAACAAGAGT 

ATATATTCAGCCGACATTCTCGTTAATTTCAGATTGCGATGAAACITTTAATTA^ 

GGGAA(XATTAAATTTATTGGTACGTGGATTTGGTAAACAAACCTGGGAATATTC 

ACCCGAGTATTCTATTAGATCATGGGCTTTCTrATTACCT TTTTA 

TCCAGTAAACAAATTTACrGACCTAGAAAGTCATTGGAAO'Il i ilCATCACAAGA 

GCATGCTTAGGCTTITITAGTmATCATGGAATITAAACTACATCGTGAAATTGC 

AGGCAGCTTGGCATTGCAAATCGCAAATATTTGGATTATITrCCAATTGTTTAATC 

CGGGCTGGTTCCATGCATCTGTGGAATTATTGCCTTCTGCCGTTGCCATGTTGTTG 

TATGTAGGTGCCACCAGACACTCTCTACGCTATCTGTCCACTGGGTCTACTTCTAA 

CTITACGAAAAGTITAGCGTACAATTrCCrGGCTAGTATACTAGGCTGGCCATTTG 

TTTTAATTTTAAGCTTGCCATTATGTTTACA^ 

CTACCATCAGAACCGCATTCGACTGCTGTTTGATATTTTCATTGACTGCAriTGCT 

GTGATTGTCACTGACAGTATATTTTACGGGAAGCTTGCTCCTGTATCATGGAACA 

TCTTATTTTACAATGTCATTAATGCAAGTGAGGAATCTGGCCCAAATATTTTCGGG 

GTTGAGCCATGGTACTACTATCCACTAAATTTGTTACTGAATTTCCCACTGCCTGT 

GCTAGTTTTAGCTATITrGGGAATTTTCCATTrGAGATTATGGCCATTATGGGCAT 

CATTATTCACATGGATTGCCGTTTTCACTCAACAACCTCACAAAGAGGAAAGATT 

TCTCTATCCAATTTACGGGTTAATAACTTTGAGTGCAAGTATCGCCTTTTACAAAG 

TGTTGAATGTATTCAATAGAAAGCCGATTCTTAAAAAAGGT ATAA AGTTGTCAGT 

TTTATTAATTGTTGCAGGCCAGGCAATGTCACGGATAGTGGCrrTGGTGAACAAT 

TACACAGCTCCTATAGCCGTCTACGAGCAATTITCTTCACTAAATCAAGGTGGTG 

TGAAGGCACCX3GTAGTGAATGTATGTACGGGACGTGAATGGTATCACTTCCCAAG 

TTCTTTCCTGCTGCCAGATAATCATAGGCTAAAATTTGTTAAATCTGGATTTGATG 

GTCTTCTTCCAGGTGATITrCCAGAGAGTGGTTCTATTTTCAAAAAGATTAGAACT 

TTACCTAAGGGAATGAATAACAAGAATATATATGATACCGGTAAAGAGTGGCCG 

ATCACTAGATGTGATTATTTTATTGACATCGTCGCCCCAATAAATTTAACAAAAG 

ACGTTTTCAACCCTCTACATCTGATGGATAACTGGAATAA GCTG GGATGTGCTGC 

ATTCATCGACGGTGAAAATTCTAAGATTTTGGGTAGAGCATTTTACGTACCGGAG 

CCAATCAACCGAATCATGCAAATAGTTTTACCAAAACAATGGAATCAAGTGTACG 

GTGTTCGTTACATTGATTACTGTTTGTTTGAAAAACCAACTGAGACTACTAATTGA 



S. cerevisiae Alg9p 

MNCKAVTISLLLIXFLTRVYIQPITSLISI^ 

YSJRSWAFLLPFYCILYPVNKFroLESHW^ 

IANIWEFQIJWGWFHASTO^ 

FLASILGWPFVLEJSLPLCLHYLFNHI^ 

VSWNILFYmONASffiSGPNIFGVFJPW 

ASLFTWIAVFTQQPHKEERi^YPrYGLITLSASIAFYKVLNLFNRO 

VAGQAMSMVALVNNYTAPIAVYEQFSSmQGGVKAP^ 

DNHRLKTVKSGFDGLLPGDFTESGSIFKKIRTLPKGMNNK^ 

DIVAPINLTKDVFNPLHL]^^ 

KQWNQVYGVRYIDYCLFEKPTETTN 
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T^OTOTGTCTGCTCGATACTTCCTmACAGTAACCAAC^TAC^TGTT 

CTCCAACATGCTCTTGTATGTATTGGCCTATTCTATCTTGAGACTTGATATC 

AACCTTCTATGGTATTATTTCAGACTGTGATGAAGTGTTCAACTACTGGGA 

GCCACTCAACTTCATGCTTAGAGGGTTTGGAAAACAGACTTGGGAGTATT 

CTCCAGAGTATGCCATCCGATCTTGGTCCTATCTAGTGCCACTTTGGATAG 

CAGGCTATCCACCATTGTTCCTGGATATCCCTTCTTACTACTTTTTCTACTT 

TTTCAGACTACTGCTGGTTATTTTTTCATTGGTTGCAGAAGTCAAGTTGTA 

CCATAGTTTGAAGAAAAATGTCAGCAGTAAGATCAGTTTCTGGTACCTTCT 

ATTTACAACCGTTGCTCCAGGAATGTCTCATAGCACGATAGCCTTATTACC 

ATCCTCTTTTGCTATGGTTTGTCACACTTTTGCCATTAGATACGTCATTGAT 

TACCTACAATTACCAACATTAATGCGCACAATCAGAGAGACTGCTGCCAT 

CTCACCAGCTCACAAACAACAACTAGCCAACTCTCTC 

P. pastoris Alg9p 

W?SCLLDTSFY3NQKTCSPTCSw^ 

NFMLRGFGKQTWEYSPEYAIEISWSYLVPLWIAGYPPLFLDIPSYYFFYFFRLLL 

WSLVAEVKLYHSLKKISrVSSKISFWYIIJETTVAPGMS^ 

TFAIRYVIDYLQLPTLMRTIRETAAISPAHKQQLANSL 
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P. pastoris ALG9 BLAST 



Score E 

Sequences producing significant alignments: 



(bits) Value 





6324110 |ref 


NP 014180. l| 




21296668|qb 


EAA08813.1 




gi 


7019765 |emb 


CAB75773.1 






26341066 


db- 


i lBAC34195.ll 


gi 


16551378 


qb 


AAL25798.ll 


qi 


19527202 


ref 


NP 598742 


1 


gi 


12053349 


emb 


CAB66B61.ll 



catalyzes the transfer of maiino. . . 131 le-29 

agCP7810 [Anopheles gambiae str 110 2e-23 

putative mannosyl transferase inv. . .104 le-21 

unnamed protein product [Mus mu. . . 99 4e-20 

DIBD1 [Homo sapiens] _99 4e-20 

J_ RIKEN cDNA B230402H15 [Mus mus. . . 99 4e-20 

hypothetical protein [Homo sapi. . . 99 4e-20 



Alignments 



S. cerevisiae 
Score « 131 bits (329) , Expect » le-29 

Identities = 62/141 (43%), Positives = 91/141 (64%), Gaps = 1/141 (0%) 
Frame = +2 

Query: 200 ISTFYGIISDC^EVTTNYWEPLNFMLRGFGKQTTOY 376 

I + +ISDCDE FNYWEPLN ++RGFGKQTWEYSPEY+IRSW++L+P + YP F 
Sbjct: 21 IQPTFSLISDCDETFNYWEPLNLLVRGFGKETWEYSPEYSI^^ 80 

Query: 377 LDIPSXXXXXXXRLLLVIFSI^^^ 556 

D+ S R L FS + E KL+ + +++ +1+ +++F PG H+++ L 

Sbjct: 81 TDLE SHWNFFI TRACLGFFS F IMEFKLHRE IAGSLALQ I ANI WI I FQLFNPGWFHASVEL 140 

Query: 557 LPS S FAMVCHTFAIRYVTD YL 619 

LPS+ AM+ + A R+ + YL 
Sbjct: 141 LPSAVAMLLYVGATRHSLRYL 161 



Anopheles gambiae 
Score = 110 bits (274), Expect = 2e-23 

Identities = 58/130 (44%), Positives = 79/130 (60%), Gaps = 3/130 (2%) 
Frame = +2 

Query: 197 LISTFYGIISDCDEVTNYWEPLNFMLRGFGKQTW^ 376 

L S Y IISDCDE +NYWEPL+++L+G G QTWEYSPE+A+RS+SY LW+ G P 
Sbjct: 34 LQSALYSIISDCDETYNYWEPLHYLLKGKGFQTWEYSPEFALRSYSY---LWLHGLPAKV 90 

Query: 377 LDIPS XXXXXXXRLLL VI FSLVAEVKLYHSLKKNVSS KIS FWYLLFTTVAPGMSHST 547 

L + + RLL+ +E +LY L + ++ +LLF + GM S+ 

Sbjct: 91 LQim^GVLIFYFTOCLLAVTC^LEYRLYRILGRKCGGG 150 

Query: 548 IALLPSSFAM 577 

ALLPSSF+M 
Sbjct: 151 AALLPSSFSM 160 
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S. ponibe 

Score = 104 bits (260), Expect = le-21 

Identities = 58/157 (36%), Positives = 85/157 (54%) 

Frame <= +2 

Query 197 LISTFYGIISDCDEVFNYWEPLNF^GFGK^^ 3 76 

L S + +1 DCDEV+NYWEPL+++L G+G QTWEYS PEYAIRSW Y+ + G+ 
Sbjct: 26 LTSAS FRV I DDCDEVYNYWEPLHYLLYGYGLQTWE YS PE YAIRSWFYI ALHAVPGFLARG 85 

Query- 377 LDIPSXXXXXXXRLLLVTFSLVAEVTO,^ 556 

L + R +L FS E L ++ +N + ++ V GM ++ + 

Sbjct: 86 LGLSRIiHVFYFIRGTVIiACFSAFC^TNLII^ 145 

Query: 557 LPSSFAMVCHTFAIRWIDYLQLPTLMRTIRETAAIS 667 

LPSSFAM T A+ L P+ RT++ +1+ 

Sbjct: 146 LPSSFAMNMVTLALS AQLSPPSTKRTVKWSFIT 179 



M. musculus 
Score = 99.4 bits (246), Expect = 4e-20 

Identities « 57/143 (39%), Positives = 76/143 (53%), Gaps = 1/143 (0%) 
Frame = +2 

Query: 152 SPTCSCMYWPILS*DLISTFYGIISI>CDEvFNYTO 331 

+ p s + +LS L + ISDCDE FNYWEP ++++ G G QTWEYS P YAIRS+ 

Sbjct: 55 APEGSTAFKCLLSARIiCAALLSNISDC 114 

Query: 332 SY-LVPLWIAGYPPLFLDIPSXXXXXXXRLI^^ 508 

+Y L+ W A + L R LL S V E+ Y ++ K +S L 

Sbjct: 115 AYLLLHAWPAAFHARI LQTNKI LVF YFLRCLLAFVS CV CELYFYKAVCKKFGLHVSRMML 174 

Query: 509 LFTTVAPC34SHSTIAIiLPSSFAM 577 

F ++ GM S+ A LPSSF M 
Sbjct: 175 AFLVLSTGMFCSSSAFLPSSFCM 197 



H. sapiens 
Score = 99.4 bits (246), Expect = 4e-20 

Identities « 56/143 (39%), Positives - 76/143 (53%), Gaps = 1/143 (0%) 
Frame = +2 

Query: 152 SPTCSCMYWPII£*DLISTFYGIISDCDEvFN^ 331 

+P S + +LS L + ISDCDE FNYWEP ++++ G G QTWEYS P YAIRS+ 

Sbjct: 55 APEGSTAFKCLLSARLCAALLSNISDCDETFNY^ 114 

Query: 332 SY-LVPLWIAGYPPLFLDIPSXXXXXXXRLLLVI^^ 508 

+Y L+ W A + L R LL S + E+ Y ++ K +S L 

Sbjct: 115 AYLLIiHAWPAAFHARI LQTNOLVFYFIjRCLLAFVSCI CELYFYKAVCKKFGLHVSRMML 174 

Query: 509 LFTTVAPGMSHSTI ALLPSS FAM 577 

F ++ GM S+ A LPSSF M 
Sbjct: 175 AFLVLSTGMFCSSSAFLPSSFCM 197 
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S. cerevisiae ALG12 

ATGCGTTGGTCTGTCCTTGATACAGTGCTATTGACCGTGATTTCCTTTCATCTAAT 

CCAAGCTCCATTCACCAAGGTGGAAGAGAGTTTTAATATTCAAGCCATTCATGAT 

ATTTTAACCTACAGCGTAT1TGATATCTCCCAATATGACCACTTGAAA.TTTCCTGG 

AGTAGTCCCTAGAACATTCGTTGGTGCTGTGATrATTGCAATGCTTTCGAGACCTT 

ATCTTTACTrGAGTrCITrGATCCAAACTTCCAGGCCTACG TCTAT AGATGTTCAA 

TTGGTCGTTAGGGGGATTGTTGGCCTCACCAATGG 

GAATTGTITGCAAGATATGTTTGATGAAATCACTGAAAAGAAAAAG GAAGAAAA 

TGAAGACAAGGATATATACATTTACGATAGCGCTGGTACATGGTTTCTTITATTTT 

TAATTGGCAGTTTCCACCTCATGTTCTACAGCACTAG GACTC TGCCTAATTTTGTC 

ATGACTCTGCCTCTAACCAACGTCGCATTGGGGTGGGTTITATTGGGTCGTTATAA 

TGCAGCTATATTCCTATCTGCGCTCGTGGCAATTGTATTTAGACTGGAAGTGTCAG 

CTCTCAGTGCTGGTATlGCTCTATTTA 

GATGCTATCAAATTCGGTATCTITGGCTTGGGACTTGGTTCCGCCATCAGTATCAC 

CGTTGATTCATATTTCTGGCAAGAATGGTGTCTACCTGAGGTAGATGGTITCTTGT 

TCAACGTGGTTGCGGGTTACGCTTCCAAGTGGGGTGTGGAGCCAGTTACTGCTTA 

TTTCACGCATTACTTGAGAATGATGTTTATGCCACCAACTGTTTrACTATTGAA^ 

ACITCGGCTATAAATTAGCACCTGCAAAATTAAAAATTGTCTCACTAGCATCTCTT 

TTCCACATTATCGTCTTATCCITrCAACCTCACAAAGAATGGAGATTCATCATCTA 

CGCTGTTCCATCTATCATGTTGCTAGGTGCCACAGGAGCAGCACATCTATGGGAG 

AATATGAAAGTAAAAAAGATTACCAATGTTTTATGTTTGGCTATATTGCCCTTATC 

TATAATGACCTCCITTTTCATITCAATGGCGTrCTTGTATATATCAAGAATGAATT 

ATCCAGGCGGCGAGGCTTTAACTTCTTTTAATGACATGATTGTGGAAAAAAATAT 

TACAAACGCTACAGrrCATATCAGCATACCTCCTTGCATGACAGGTGTCACTTTAT 

TTGGTGAATTGAACTACGGTGTGTACG GCAT CAATTACGATAAGACTGAAAATAC 

GACTTTACTGCAGGAAATGTGGCCCTCCTITGATrrCTTGATCACCCACGAGCCA 

ACCGCCTCrCAATTGCCATrCGAGAATAAGACTACCAACCATTGGG AGCTAGTTA 

ACACAACAAAGATGTTTACTGGATTTGACCCAACCTACATTAAGAACnrrTGTTTT 

CCAAGAGAGAGTGAATGTTTTGTCTCTACTCAAACAGATCATTTTCGACAAGACC 

CCTACCG 1T1 ' 11 ' 1 1 G AAAG AATTGACGGCCAATTCGATTGTTAAAAGCG ATGTCTT 

CITCACCTATAAGAGAATCAAACAAGATGAAAAAACTGATTGA 

S. cerevisiae Algl2p 

KffiWSVLDTVLLTVISFHLIQAPFT^^ 

RTFVGAVnAMLSPJPYLYI^SLIQTSRPTC 

FDErreKKKEENEDKDIYIYDSAGTWnJJFLIGSFHI^^ 

GWVLLGRYNAAIFI£ALVAIVFRLEVSAI£A^ 

AISnVDSYFWQEWCIJPEVDGFLFNWAGYAf! 

LLLNYFGYKLAPAKLKIVSLASI^^ 

EmiKVKOINVLCIAILPLSlMTSFFISMAF^ 

ATVfflSIPPCMTGVTLFGELNYGVYGINYD^ 

FENKTTNHWELV>mXMFrG^ 

ANSIVKSDVFFTYKRIKQDEKTD 
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P. pastoris ALG12 

TCGGTCGAGAATGATAACTGAAGAACTCAAAATCTCTCACACTTTCATCGT 

TACTGTACTGGCAATCATTGCATTTCAGCCTCATAAAGAATGGAGATTTAT 

AGTTTACATTGTTCCACCACTTGTCATCACCATATCTACAGTACTTGCACA 

ACTACCCAGGAGATTCACAATCGTCAAAGTTGCTGTTTTTCTCCTAAGTTT 

CGGCTCTTTGCTCATATCCCTGTCGTTTCTTTTCATCTCATCGTATAACTAC 

CCTGGGGGTGAAGCTTTACAGCATTTGAACGAGAAACTCCTTCTACTGGA 

CCAAAGTTCCCTACCTGTTGATATTAAGGTTCATATGGATGTCCCTGCATG 

CATGACTGGGGTGACTTTATTTGGTTACTTGGATAACTCAAAATTGAACAA 

TTTAAGAATTGTCTATGATAAAACAGAAGACGAGTCGCTGGACACAATCT 

GGGATTCTTTCAATTATGTCATCTCCGAAATTGACTTGGATTCTTCGACTG 

CTCCCAAATGGGAGGGGGATTGGCTGAAGATTGATGTTGTCCAAGGCTAC 

AACGGCATCAATAAACAATCTATCAAAAATACAATTTTCAATTATGGAAT 

ACTTAAACGGATGATAAGAGACGCAACCAAACTTGATGTTGGATTTATTC 

GTACGGTCTTTCGATCCTTCATAAAATTTGATGATAAATTATTCATTTATG 

AGAGGAGCAGTCAAACCTGAAAATATATACCTCATTTGTTCAATTTGGTGT 

. . . — . ^- ^-.^-.^-i i m i o I /~»rr»r / ~irr»r>/-inn AAA TT^ A /Tl AAA l~2./~"~r A A *TTPP4 

ATTGCTGCAAAAAATACCAATGCCCATAA 
P. pastoris Algl2p 

RMTTEELKISlTnTVTVIAIIAPQPHKEWR^^ 

KVAWLI^FGSLLISI^l^FISSYNYPGGEALQHLNEKLLLLDQSSLPVDIKVH 
MDWACMTGVTLFGYIJ}NSKI2^^ 

SSTAPKWEGDWIJmWQGYNGI^QSIKNTlFNYGILKRM^ 
RTVFRSFIKEDDKLFIYERSSQ 
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P. pastoris ALG12 BLAST 



Score E 

Sequences producing significant alignments: (bits) Value 





13 02525 lemblCAA96310.l| ORP YNR03Ow [Saceharomyces cerev. . .102 5e-2l 


f?i 


19112221 


ref 


NP 595429. l| putative involvement in cell w. . . 56 5e-07 


9^ 


15B64569 


etnb 


CACB3681.ll putative dolichyl-p-man; Man7Gl ... 53 4e-06 


9i 


13129114 


ref 


NP 077010. l| dolichyl-p-mannose:Man7GlcNAc2 . . ._S3 4e-06 


gi 


22266724 


qb|AAM94900.l|AF311904 1 membrane protein SBB7 . . . 53 4e-06 


g 1 


18478284 


erab|CAD22101.l| putative mannosyl transferase [M. . . 52 Be-06 



Alignments 
S. cerevisiae 
Score = 102 bits (255), Expect = 5e-21 

Identities = 74/258 (28%), Positives = 121/25B (46%), Gaps = 19/258 (7%) 

Query: 8 KMITEKLKXSHTFIVTVLAIIAFQPHKEWRFrVYIvPPLVT 187 

++ +LKI + + +++FQPHKEWRFI+Y VP +++ +T A L + K+ 

Sbjct: 302 KLAPAKLKIVSLASLFHI IVLSFQPHXEWRFI I YAVPS IMLLGATGAAHLWENMKVKKI T 361 

^ ^ + NYPGGEAL N+ ++ + VH+ 

Sbjct: 362 NVLCLAI LPLS IMTS FFISMAFLYI SRMNYPGGEALTS FNDMI V EKNTTNATVHIS 417 

Query: 347 WACMTCVTLFGYLDNSKIiNNIiRI^ - LDTIWDSFNYVT SEIDLDSS 505 

+P CMTGVTLFG L+ I YDKTE+ + L +W SF+++I S++ ++ 

Sbjct: 418 I PP CMTGVTL FGELNYGVYG 1 NYD K'l'K N TTLiiiQ EMWP S FD FL I THE P TAS QL P FENK 474 

Query: 506 TAPKWEGDWLKIDWQGYNGINKQS I KNTI FN YGI IiKRMI RD ATKLD VG F I RTVF 670 

T WE ++ + + G + IKN +F +LK++I D K F++ + 

Sbjct: 475 TTNHWE L VNTTKMFTGFDPTYI KNFVFQERVNVLSLLKQI IFD - - KTPTVFLKELT 528 



Query: 671 RSFIKFDDKLFIYERSSQ 724 

+ 1 D F Y+R Q 
Sbjct: 529' ANSIVKSDVFFTYKRIKQ 546 



S . pombe 

Score = 56.2 bits (134), Expect » 5e-07 

.Identities « 46/152 (30%), Positives m 62/152 (40%), Gaps = 11/152 (7%) 

Query: 65 I IAFQPHKEWRFI VYI VPPLVITI STVLAQL PRRFTIVKVAVXXXXXXXXXX 220 

4 +F HKEWRFI+Y + P S+AL +F I+++ 

Sbjct: 295 vYSFLGHKEWRFIIYSI-PWFNAASAJGASLCFNASKF^ 353 

Query: 221 XXXXXXXXXYNYPGGEALQHIiNEKLLIiLiDQS SLPVDI KVHMDVPACMTGVTLFGYIiDNSK 400 

Y YPGG AL L E + VHMDV CMTG+T F L + 

Sbjct: 354 SSFLLYVFQYAYPGGLALTRLYE 1 ENHPQVSVHMDVYPCMTG I TRFSQLPS - - 404 
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Query: 401 LNNLRIVYDKTEDESL DTIWDSFNYVTSE 487 

YDKTED + F+Y+I+E 
Sbjct: 405 WYYDKTEDPKMLSNSLFISQFDYLITE 431 



Homo sapiens 
Score = 53.1 bits (126), Expect » 4e-06 

Identities = 41/149 (27%), Positives = 68/149 (45%), Gaps = 6/149 (4%) 

Query: 59 IiAJIAFQPHKEWRFIvYIVPPLVTTISTVLAQLPRR FTIVKVAVXXXXXXXXXX 220 

+A + + PHKE RFI+Y P L IT + + L + +V 

Sbjct: 299 MALYSLLPHKELRFIIYAFPMIiia^ 358 

Query: 221 XXXXXXXXXYNYPTCEAIX}HLNEK^ 400 

+NYPGG A+Q L++ L+ Q+ D+ +H+DV A TGV+ F ++++ 
Sbjct: 359 SATALYVSHFNYPGGVAMQRLHQ— LVPPQT D VliLHI DVAAAQTGVS R FLQVNS AW 412 

Query: 401 LNNLRJVYDKTEDESLDTrWDSFNYVISE 487 

YDK ED T ++ +++ E 
Sbjct: 413 R YDKREDVQPGTGMLAYTHILME 435 
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FIGURE 25 



ATGGCCATTGGCAAAAGGTTACTGGTGAACAAACCAGCAGAAGAATCATT 

TTATGCTTCTCCAATGTATGATTTTTTGTATCCGTTTAGGCCAGTGGGGAA 

CCAATGGCTGCCAGAATATATTATCTTTGTATGTGCTGTAATACTGAGGTG 

CACAATTGGACTTGGTCCATATTCTGGGAAAGGCAGTCCACCGCTGTACG 

GCGATTTTGAGGCTCAGAGACATTGGATGGAAATTACGCAACATTTACCG 

CTTTCTAAGTGGTACTGGTATGATTTGCAATACTGGGGATTGGACTATCCA 

CCATTAACAGCATTTCATTCGTACCTTCTGGGCCTAATTGGATCTTTTTTCA 

ATCCATCTTGGTTTGCACTAGAAAAGTCACGTGGCTTTGAATCCCCCGATA 

ATGGCCTGAAAACATATATGCGTTCTACTGTCATCATTAGCGACATATTGT 

TTTACTTTCCTGCAGTAATATACTTTACTAAGTGGCTTGGTAGATATCGAA 

ACCAGTCGCCCATAGGACAATCTATTGCGGCATCAGCGATTTTGTTCCAAC 

CTTCATTAATGCrCATTGACCATGGGCACrTTCAATATAATTCAGTCATGC 

TTGGCCTTACTGCTTATGCCATAAATAACTTATTAGATGAGTATTATGCTA 

TGGCGGCCGTTTGTTTTGTCCTATCCATTTGTTTTAAACAAATGGCATTGTA 

TTATGCACCGATTTTTTTTGCTTATCTATTAAGTCGATCATTGCTGTTCCCC 

A A A TTT A ACATAGCTAGATTGACGGTTATTGCGTTTGCAACACTCGCAACT 

TTTGCTATAATATTTGCGCCATTATATTTCTTGGGAGGAGGATTAAAGAAT 

ATTCACCAATGTATTCACAGGATATTCCCTTTTGCCAGGGGCATCTTCGAA 

GACAAGGTTGCTAACTTCTGGTGCGTTACGAACGTGTTTGTAAAATACAA 

GGAAAGATTCACTATACAACAACTCCAGCTATATTCATTGATTGCCACCGT 

GATTGGTTTCTTACCAGCCATGATAATGACATTACTT CATCCCAAAAA GCA 

TCTTCTCCCATACGTGTTAATCGCATGTTCGATGTCCTITITrCTTTrrAGC 

TTTCAAGTACATGAGAAAACTATCCTCATCCCACTTTTGCCTATTACACTA 

CTCTACTCCTCTACTGATTGGAATGTTCTATCTCTTGTAAGTTGGATAAAC 

AATGTGGCTTTGTTTACGCTATGGCCTTTGTTGAAAAAGGACGGTCTTCAT 

TTACAGTATGCCGTATCTTTCTTACTAAGCAATTGGCTGATTGGAAATTTC 

AGTTTrATTACACCAAGGTTCTTGCCAAAATCTTTAACTCCTGGCCCTTCT 

ATCAGCAGCATCAATAGCGACTATAGAAGAAGAAGCTTACTGCCATATAA 

TGTGGTTTGGAAAAGTTTTATCATAGGAACGTATATTGCTATGGGCTTTTA 

TCATTTCTTAGATCAATTTGTAGCACCTCCATCGAAATATCCAGACTTGTG 

GGTGTTGTTGAACTGTGCTGTTGGGTTCATTTGCTTTAGCATATTTTGGCTA 

TGGTCTTATrACAAGATATTCACTTCCGGTAGCAAATCCATGAAGGACITG 

TAG 

S. cerevisiae ALG6p 

MMGKIU.LVNKPAEESFYASPMYDFLYPFRPVGNQW1PEYIIFVCAVILRCTIG 

LGPYSGKGSPPLYGDFEAQRHWMEITQHLPLSKWYWYDLQYWGLDYPPLTA 

FHSYLLGLIGSFFNPSWTAI^KSRGFESPDNGIXTYMRSTVnSDILFYFPAV^ 

FTKWLGRYPJ^QSPIGQSIAASAILFQPSLMLroHGHFQYNSVMLGLTAYAINN 

LIJ>EYYAMAJVVCFVI£ICFKQMALY^ 

ATLATFAIIFAPLYFIXSGGlJD^QCmRlEPFARGIFEDKVANF^ 
YKERFTIQQLQLYSLIATVIGFLPAMIMTLIi^ 
VHEKmiPLLPITLLYSSTDWNVLSLVSWIN^ 
VSFIiSNWUGNFSFriPRFIJKSLTPGPSISSIN^^ 

YIAMGFYHFIX>QFVAPPSKYPDLWVLLNCAVGFICFS]FWLWSYYKIFTSGSK 
SMKDL 
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FIGURE 26 

P. pastoris ALG6 

ATGCCACATAAAAGAACGCCCTCTAGCAGTCTGCTGTATGCAAGAATTCC 

AGGGATCTCTTTTGAAAACTCTCCGGTGTTTGATTTTTTGTCTCCTTTTGGA 

CCCGCTCCTAATCAATGGGTAGCACGATACATCATCATCATCTTTGCAATT 

CTCATCAGATTGGCAGTTGGGCTGGGCTCCTATTCCGGCTTCAACACCCCT 

CCAATGTATGGGGATTTTGAAGCTCAGAGGCATTGGATGGAAATTACTCA 

GCATTTATCCATAGAAAAATGGTACTTCTACGACTTGCAATATTGGGGGCT 

TGACTATCCTCCCTTGACAGCCTTTCATTCATACTTCTTTGGCAAATTAGGC 

AGCTTCATCAATCCAGCATGGTTTGCTTTAGACGTCTCCAGAGGGTTTGAA 

TCAGTGGATCTAAAATCGTACATGAGGGCGACCGCAATTCTCAGTGAGCT 

GTTATGTTTTATTCCAGCTGTCATTTGGTATTGTCGTTGGATGGGACTTAAC 

TACTTCAATCAAAACGCCATTGAGCAAACTATAATAGCGTCTGGTATTCTT 

TTCAATCCATCTTTAATTATCATAGATCATGGCCACTTCCAGTACAACTCA 

GTTATGCTAGGTTTTGCTTTATTATCCATATTAAATCTGTTGTACGATAATT 

rTGCATTAGCGGCTATTTTITrCGTTCTTTCAATAAGCTTTAAGCAAATGGC 

TCTCTATTATAGGCCCATCATGTTTTTTTACATGCTGAGTGTGAGTTGTTGG 

CCTTTGAAAAACTTCAACTTGTTGAGATTGGCTACTATCAGTATTGCAGTA 

CTCTTGACTTTTGGAACTCTATTACTGGCTTTTGTATTAGTAGATGGGATGT 

CACAAATTGGCCAAATATTATTCAGAGTTTTCCCGTTTTCAAGAGGCTTGT 

TTGAGGATAAGGTGGCCAACTTTTGGTGTACAACGAATATACTGGTAAAG 

TACAAACAGTTATTCACTGACAAAACCCTTACTAGGATATCGCTAGTAGC 

AACTTTGATTGCAATTAGTCCGTCTTGCTTCATCATTTTTACTCACCCAAAG 

AAGGTTTTACTACCGTGGGCTTTTGCTGGTTGCTCTTGGGCGTTCTATCTTT 

TCTCTTTCCAAGTCCACGAGAAATCAGTTTTAGTTCCATTGATGCCTACCA 

CTCTATTACTGGTAGAAAAAGACTTGGACATCATCTCAATGGTCTGCTGGA 

TTTCTAATATTGCCTTCTTCAGGATGTGGCCTCTATTAAAAAGAGACGGGC 

TGGGTTTGGAATATTTTGTCTTGGGAATATTGAGTAATTGGCTGATTGGAA 

ACCTCAATTGGATTAGTAAATGGCTTGTCCCCAGTTTCCTGATTCCAGGGC 

CTACTCTCTCCAAAAAAGTTCCTAAAAGAGATACTAAAACAGTTGTTCAT 

ACTCACTGGTTTTGGGGGTCAGTAACATTCGTTTCATACCTCGGAGCTACA 

GTTATCCAGTTCGTAGATTGGCTGTACCTTCCACCTGCCAAGTATCCAGAT 

TTGTGGGTTATTTTGAACACTACATTGTCGTTTGCTTGTTTCGG 

GGCTATGGATTAACTACAATCTGTACATTTTGCGTGATTTTAAGCTTAAAG 

ATGCTTAG 

P. pastoris Alg6 

MPHKRTPSSSLLYAKJPGISFENSPVFDFLSPFGPAPNQWVARYnilFAILIRLAV 
GLGSYSGFNTPPMYGDFEAQRHWMEIT^^ 

AFHSYFFGKLGSFIM'AWFALDVSRGFESVDLKSYMP^TAII^ELLCFIPAVTW 
YCRWMGLNYFNQNAIEQTIIASAILFT^ 

LYDNFALAAIFFVLS ISFKQMALYYSPIMFTYMLS VSC WPLKNFl^ILLRLATISI 
AVLLTFATLLLPFVLVDGMSQIGQnJRVFPF^ 

YKQLFTDKTLIiaSLVATLIAISPSCTIIFTHPKKVLIRWAFAACSWAFYIRSFQ 

VHEKSVLWmPTTLLLVEKDLDra 

VLGE^NWLKjNWWISKWL^ 

VTFVSYLGATVIQFVDWLYLPPAKYPDL^ 

YILRDFKLKDA 
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P. pastoris ALG6 BLAST 

Se^ences E producing significant alignments: (bitB) Value 



3i 



3i 



Si 



1420090 | eiDblCAA99190.il ORF YOR002w J Saccharoses cerev. . e-137 



74905B4 [pirj 



19921070 lref 



2iJ 

ai 



Ji-JU . J. | wav* » - - 

T40396 glucosyltransf erase - fission yeast . . .369 



NP 609393. l| CG5091-PA [Drosophila melanoga . . . 47 4e-64 



gi 



ai 



ai 



a* 



ai 



ai 



V524Q9 20lref iNP 19B662.lj glucosyltransf erase -like prote . . .244 3e-63 
7019325 I ref INP 037471.11 dolichyl-P-GlC:Man9GlcNAc2-PP-d. . .|2| 2e-61 



l2002040|gbiAAG43163.iTAF063604 1 brain my04 6 protein [H. . .2|| 7e-6l 



n7fi67llsPl O09226lAIiG6 CAEEL Probable dolichyl pyrophosp. . .222 3e "*' 
— ■ ■ --- - agCP4617 [Anopheles gambiae str 219 Be-56 



21302638 | qb I EAA147B3 .1 
54417B8 | etnb | CAB46771 . 1 



aw * ooj | ^ probable glucosyltransf erase [Sc . . le-47 

13129070 1 ref INP 0769B47l| hypothetical protein MGC2B4 0 S...112 le-23 

. ' . . ' Z~ i -i 1 j _ jr — ^ rUrann enni one! 115 lS~23 



1j l^JU / w I XtSJ. I JIC vtv^v^.+j »Ji r r , - „ 

2996578 |emb I CAA12176.ll glucosyltransf erase [Homo sapiens] 112 ie-^J 



20835439lref {XP 131506Tl| similar to Dolichyl pyrophosph. . .104 3e-21 



Alignments 



Score = 489 bits (1259), Expect = e-137 • 

Identities = 274/530 (51%), Positives = 358/530 (67%), Gaps « 5/530 (0%) 

Ouerv- 20 Sin^SPVFT^PFGPAFNQWVX^^ 79 

SF SP++DFL PF P NQW+ +G3jG YSG +PP+YGDFEAQRH 

Sbjct: 16 S FYASPN1YDFLYPFRPVGNQWLPEYT I FVCAVILRCTI GLGP YSGKGSPPLYGDFEAQRH 75 

Query- 80 Wl^I TQHLS IEKWYFYDl^YWGLiDYPPLTAFlK 139 

WMEITQHL + KWY+YDLQYWGLDYPPLTAFHSY G +GSF NP+WFAL+ SRGFES D 
Sbjct: 76 WMEITQHLPLSKWYVTTOI^ 135 

Query 140 - -LKSYMRATAILSEIilXIFIPAVIWYOT 197 

IiK+YMR+T I+S++L + PAVI++ +W+G Y NQ+ I Q+I ASAILF PSL++IDH 
Sbjct: 136 NGLKTYT^TVIISDILl^FPAVIYl^IOfLG-RYRN 194 

Query- 198 C^QYNSVm/3FALI^Il^ 257 

GHFQYNSVMLG +1 NLL + +A+AA+ FVIiSI FKQMALYY+PI F Y+LS S 
Sbjct: 195 GHFQYNSVMLGLTAYAINI^IjDEYYAMAAVCFVIjS I CFKQMALYYAPI FFAYIiLSRSLL- 253 

Query: 258 LKNFNLLRIAT I S IAVLLTFATLLLP - FVLYDGMSQ I GQ I LFRVFP FS RGIiFEDKVANFW 316 

FN+ RIi 1+ A L TFA + P + Ii G+ I Q + R+FPF+RG+FEDKVANFW 
Sbjct: 254 FPKFNI ARLTTOkFATLATFAI I FAPLYFLGGGLKNIHQCIHRI FPFARGI FEDKVANFW 313 

Query: 317 CITl^LVKYKQLFTDKTLTRI SLVATLIAI S PS CFI I FTHPKKVLJjPWAFAACSWAFYLF 376 

C TO+ VKYK+ FT + L SL+AT+I P+ + HPKK LLP+ ACS +F+LF 
Sbjct: 314 CVTl^VKYKERFTIOQI^^ 373 

Query- 377 SFQVHEKSXXXXXXXXXXXXXEK^ 436 

SFQVHEK+ D +++S+V WI+N+A F++WPLLK+DGL L+Y V + 

Sbjct: 374 SFQVHEKTILIPl^PITIiYSSTOWNV^ 433 

Query: 437 LSI^IGNLlWISKWLv^ 496 

LSNWLIGN ++I+ +P L PGP4-+S ++++ + W S +Y+ 

Sbjct: 434 LSl^It^SFITPRFLPKSIiTPGPSISSI^ 493 
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Query: 497 QFVDWLYLPPAJCYPDLVTVTiLNTTLSFACFGLFWLW^ 546 

F+D PP+KYPDLWV+LN + F CF +FWLW Y 4-+ +KD 
Sbjct: 494 HFIJDQFVAP PSKY PDIiWVLLNCAVCSFI CFS I FWLWS YYKI FTSGSKSMKD 543 

S. pombe 

Score = 369 bits (946) , Expect = e-101 

Identities = 228/513 (44%), Positives = 315/513 (61%), Gaps = 35/513 (6%) 

Query; 21 FEN- SPVFDFLSPFGPAPNQWVXXXXXXXXXXX^^ 79 

FEN +PV F+S F ++++ + +G YSG+NTPPMYGDFEAQRH 

Sbjct: 5 FENGAPVQQFVSRFRSYSSKFLFFPCLIMSLVPT5QWLISIGPYSGYNTPPMYGDFEAQRH 64 

Query: 80 V^ITQHLSIEKWYFYDLQYWGIjDYPPLTAFHSre^ 138 

WME+T H + +WYF DLQ+WGLDYPPLTA+ S+FFG +G F NP WFA SRGFES+ 
Sbjct: 65 Vn^LTLHTPVSQWYFRDIiQWWGLDYPPLTAYVSWFFGI I GHYFFNPEWFADVTSRGFESL 124 

Query: 139 DLKSYMRATAIIiSELLCFIPAVTWYCRWMGIiNYFNQNM 198 

+LK +MR+T I S LL +P -M-+Y +W N +++ +LF P+L++IDHG 

Sbjct: 125 ELKLFMRSTVIASHLLILTOPLMFYSKWWSRRI - -PNFVDRNASLIMVLFQPALLLIDHG 182 

Query: 199 HFQ YNSVMLGFALLS I LNLL YDNF ALAAI FFVLS I S FKQMAL YYS P IMFF YMLSVS CWPL 258 

HFQYN VMLG + +1 NLL + '+ A FF L+++FKQMALY++P +FFY+L P 
Sbjct: 183 HFQ YNC^/MLGL VMYAIANLLKNQYVAAT F FFCLALT FKQMAL YFAP P I FFYLLGTCVKPK 242 

Query: 259 KNFNLLRI^TISIAVLLTFATLI^^ 318 

F+ R +S+ V+ TF+ +L P++ +D +■ + QIL RVFFF+RGL+EDKVANFWCT 
Sbjct: 243 IRFS--RFILI^VTVVFTFSLILFPWIYMDYKTLLPQIIiHRVFPFARGL 300 

Query: 319 TNILVKYKQLFTDKTLTRI SLVATLI AI S PS CFI I FTHPKKVIjLPWAFAACSWAFYLFSF 378 

N + X +++FT L ISL+ TLI+I PSC I+F +P+K LL FA+ SW F+LFSF 
Sbjct: 301 LNTVTKIREVFTLHQLQVISLIFTLISILPSOT 360 

Query: 379 QVHEKSXXXXXXXXXXXXXEKDIM 438 

QVHEKS ++ + +N+A FS+WPLLK+DGL L+YF L ++ 

Sbjct: 361 QVHEKSVXLPLLPTSIIiCHGNICTKPWIALAN^ 420 

Query: 439 NWLIGNLNWISKWLVPSFLIPGPTLSKKVPKRDTKTVVHT^ 498 

NW IG++ SK ++ F + Y+G VI 

Sbjct: 421 NW - 1 GDMWFS KNVLFRF IQLSFYVGMIVILG 451 

Query: 499 VDWL YLP PAKY PDLWVT LNTTLS FACFGLFWLW 531 

+D PP++YPDLWVTLN TLSFA F +LW 
Sbjct: 452 IDLFIPPPSRYPDLWILNVTLSFAGFFTIYLW 484 



D. melanog-aster 

Score = 247 bits (630), Expect = 4e-64 

Identities = 175/490 (35%), Positives = 267/490 (54%), Gaps = 55/490 (11%) 

Query: 57 VGLGSYSGFNTPPMYGDFEAQRHWMEITQHLSIEKWYF YDLQYWGLDYRPLTAFHS 112 

+ L S YSGF++ PPM+GD+EAQRHW EIT +L++ +WY DLQYWGLDYPPLTA+HS 
Sbjct: 19 I SL YS YSGFDS P PMHGD YEAQRHWQEI TVNLAVGEWYTNS SNNDLQ YWGLD YP PLTAYHS 78 

Query: 113 YFFGKLGS FINPAWFALDVSRGFES VDLKS YMRATAI LS ELLCFI PA VT WYCRWMGLNYF 172 

Y G++G+ I+P + L SRGFES + K +MRAT + +++L ++PA++ + + 

Sbjct: 79 YLVGRIGA5IDPRFVELHK5RGFESKEHKRFMRATW5A 138 
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Query- 173 NQNAI EQTI I ASAILPNPS LI 1 1 DHGHFQ YNSVMLG FALLS ILNLL YDNFALAAI FFVLS 232 

+ + + + +A P +ID+GHFQYN++ LGFA ++I +L F AA FF L+ 
Sbjct: 139 SDDKLFLFTLVAAY- - - PGQTL I DNGHFQ YNNX S LGFAAVAXAAJ LRRRFYAAAF F FTLA 195 

Query- 233 I S FKQMALYY S P IMFFYMLS VS CWPLKNFN - - LLRLAT I S IAVLLTFATLLLP FVLVDGM 290 

+++KQM LY+S + FF L C K+F + ++ 1+ VL TFA L +P+ + + 
Sbjct: 196 LNYKQMEL YHS - LP FFAFLLGECVSQKS FASFIAEI SRIAAWLGTFAI LWVP W --LGSL 252 

Query: 291 SQIGQILFRVFPFSRGLFEDKVANFWCTTNTLVKYKQLFTO 350 

+ Q+L R+FP +RG+FEDKVAN WC N++ K K+ ++ + + + TLIA P+ 
Sbjct: 253 QAVLQVLHRLFPVARGVFEDKVANWCAVNVV^^ SNDQMALVCI ACTLI ASLPTN 312 

Query- 351 FIIFTHPKKVIiPWAFAA^AFYLFSTO 409 

++F V A S AF+LFS FQVHEK+ + + CW 

Sbjct: 313 VLLFRRRTNVGFLLALFNTSLAFFLFS FQVHEKTI LLTALPA LFLLKCWP 362 

Query: 410 I SOT AFFSMWPLLKRDGLALETFVLGI LSNWL I GNLNWI S KWLVPS FLI PGPTLS 464 

+ FSM PLL RDL+V + ++ +SK LS 
Sbjct: 363 DEMILFIiEVTVFSMLPLIjARDEIjLVPAWATVAFHLIFKCFDSKSK : LS 410 

Query- 46S KKVPKKDTKTVVOTHWFWGSVTFVS YLGATVI QFVDWLYLP - PAKYPDLWVTLNTTL5 FA 523 

+ P+ + ++S + A+ L+PP KYPDLW ++ + S 

Sbjct: 411 NEYPLKYIANI SQILMISVWAS-- LTVPAPTKYPDLWPLllbVTSUU 456 

Query: 524 CFGLFWLWIN 533 

F LF+LW N 
Sbjct: 457 HFFLFFLWGN 466 



A. thai i ana 

Score « 244 bits (622) , Expect = 3e-63 . 
Identities = 187/48B (38%), Positives = 248/488 (50%), Gaps = 39/488 (7%) 

YSGFNTPPMYGDFEAQRHWMEITQHLSIEKWY FYDLQYWGLDYPPLTAFHSYFFGK 117 

YSG PP +GDFEAQRHWMEIT +L + WY + DL YWGLDYP PLTA+ SY G 



F NP AL SRG ES K MR T + S+ F PA +++ N 
FLRFFNP ESVALLS S RGHE S YLG KLLMRWTVLS SD AF IFF P AAL F FVLVYHRNRTRGG KS 

EQTI I ASAILFKTPSLI I IDHGHFQYNSVMIjGFAIjIiSIIi^LYDNFAIiAAIFFVIiS ISF^ 
E + IL NP LI+IDHGHFQYN + LG + +1 +L ++ L + F L++S KQ 

EVAWHIAMILLOTCLILIDHGHFQYNCISLGLTVG 

MALYY S P IMFFYMLSVS CWPLKNFNLLRLAT I S I AVLLTFATLLLPFVLIVDGMSQ I GQI L 
M+ Y++P F ++L C K+ +L + + IAV++TF P+ V + +L 

MSAYFAPAFFSHLLG-KCLRRKS- PILSTVIKIjGIAVIVTFVTFTWPY- -VHSLDDFLMVL 

FRVFPFSRGLFEDKVANFWCTTNILVKYKQLFTDKTLTRI SLVATLIAI S PSCFI I FTHP 
R+ PF RG++ED VANFWCTT+IL+K+K LFT ++L ISL AT++A PS P 



S AFYLFSFQVHEKS L + ++ A FS 

3 SMAFYLFS FOVHE KSILMP FLSATLLA LKLPDHFSHLTYYALFS 412 



Query: 


62 


Sbjct: 


61 


Query: 


118 


Sbjct: 


121 


Query: 


178 


Sbjct: 


181 


Query: 


238 


Sbjct: 


241 


Query: 


298 


Sbjct: 


297 


Query: 


358 


Sbjct: 


357 
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Query: 41B MWPLLKRDGLALEYFVLGILSNWLI GNLNWISKWLVPSFL IPGPTLSKKVPKRD 471 

M+PLL RDL+YLL + GN+IKVF PG 
Sbjct: 413 MFPLIiCRDKLIjI P YLTLS FLFTVT YHS PGNHHAI QKTDVS FFS FKNFPGYVF 464 

Query: 472 TKTWHIHWFWGSVTFVSYIjC^TVIQF 531 

++ TH+F V V YL PP KYP L+ L L F+ F +F + 

Sbjct: 465 LLRTHFF I SWLHVLYLTI K PPQKYPFLFEALIMILCFSYFIMFAFY 511 

Query: 532 INYKLYIL 539 

NY + L 
Sbjct: 512 TOYTQWTI, 519 
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K. lactisALG6 

ATCTCTGTTTCAACAGCTCTTGCATTCATTGGrrCTTTCGGTCCAATCTATA 

TCTTTGGAGGATACAAGAACTTAGTGCAATCAATGCACAGGATTTTTCCAT 

TTGCCAGGGGTATCTTTGAAGATAAAGTTGCGAATTTTTGGTGCGTTTCTA 

ATATTTTCATCAAATATAGAAATCTATTCACTCAGAAGGATCTTCAATTAT 

ACTCATTACTCGCAACAGTTATTGGGCITTTACCATCATTCATTATAACAT 

TTTTATACCCGAAGAGACATTTACTACCATATGCTTTGGCCGCATGTTCGA 

TGTCATTCTTCTTATTCAGCTTCCAGGTTCATGAAAAGACAATCTTATTAC 

CTTTACTTCCTATTACACTCTTGTACACGTCAAGAGATTGGAATGTTCTAT 

CATTGGTTTGTTGGATTAACAACGTGGCATTGTTTACACTCTGGCCATTAC 

TGAAAAAGGACAATCTAGTATTGCAATATGGAGTCATGTTCATGTTTAGC 

AATTGGTTGATCGGTAACTTCAGTTTCGTCACACCACGCTTCCTCCCAAAA 

TTTTTGACACCAGGGCCATCCATCAGTGATATAGATGTTGATTATAGACGG 

GCAAGTTTACTACCCAAGAGCCTAATATGGAGATTAATCATTGTTGGCTCA 

TATATTGCAATGGGGATTATTCATTTTCTAGACTATTACGTCTCCCCGCCA 

TCAAAATACCCTGATTTATGGGTGCTTGCCAATTGTTCCTTGGGCTTCTCA 

TGTTTTGTGACATTTTGGATATGGAACAATTATAATTATTCGAAATGAGAA 

ACAGCACTTTGCAAGATTTA 



K. lactis Alg6p 

ISVSTALAHGSFGPmFGGYKmVQSMHRIFPFARGIFEDKVAlvrFWCVSNIFIK 
YPJttJTTQKDLQLYSLLATVIGLLPSFIlTF^ 

VHEKTELIJ'LLPITLLYTSRD WNVLSLVCV^NNV ALFTLWPLIXKDNLVLQYG 

VMFMFSNWLIGNFSFVTPPJFI^KFLT^ 

GSYIAMGIIfflTJDYYVSPPSKYPDLWVLANC^ 

TALCKI 
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K. lactis ALG6 BLAST 



Score E 

Sequences producing significant alignments: 



(bits) Value 



Si 
Si 
Hi 

si 
si 
si 



7490584 Ipirl 



15240920 |ref 



T40396 



1420090 lemb | CAA99190.ll ORF YOR002w [Saccharomyces cerev. . . 392 e-108 

. - " 187 2e-46 

117 2e-25 

103 2e-21 

. _ _ 102 8e-21 

reflNP 609393.1 1 CG5091-PA [Drosophila melanoga . . . 101 le-20 



_____ glucosyl transferase - fission yeast 
NP 198662. ll glucosyl transf erase-like prote. . . 
7019325 j ref |HP_037471 .3j_ dolichyl - P-Glc :Man9GlcNAc2 - PP-d . . . 

brain my046 protein tH. . . 



12002040 



19921070 



gb|AAG43163.l|AF063604 1 



Alignments 



S. cerevisiae 
Score - 392 bits (1006), Expect - e-108 

Identities = 182/280 (65%), Positives = 21B/280 (77%), Gaps = 1/280 (0%) 
Frame = +1. 

Query: 1 I SV S TALAFI GS FGP I YI FGG- YKNLVQSMHRI FP FARG I FEDKVANFWCVSNT F I KYRN 177 

1+ +T F F P+Y GG KN+Q +HRI FP FARG I FEDKVANFWCV +N+ F+ KY+ 
Sbjct: 265 IAFATLATFAI I FAPIiYFLGGGLKNIHQCIHRI FPFARGI FEDKVANFWCVTNVFVKY7CE 324 

Query:' 178 LFTQKDLQLYSLLATVIGLLPSFIITFLyPKRHLLPYAIiAACS 357 

FT + LQLYSL+ATVTG LP+ I+T L+PK+HLLPY L ACSMSFFLFS FQVHEK 
Sbjct: 325 RFTIQQLQLYSLIATVIGFLPAMIMTLLHPK^^ 384 

Query: 3 58 XXXXXXXXYTSRDWNVLSLVCW^ 537 

1 Y+S DWNVLSLV WINNVALFTLWPLLKKD L LQY V F+ SNWLIGNFSF 

Sbjct: 3 85 PLLPI TLLYS S TDWNVLSLVS WINNVAL FTLWPLLKKDGLHLQYAVS FIiLSNWL IGNFS F 444 

Query: 53 8 VTPRFLPKFLTPGPSISDIDVDYRRASIiLPKSIiIWRLIIVGSYIAMGIIHFIiDYYVSPPS 717 

+TPRFLPK LTPGPSIS 1+ DYRR SLLP +++W+ I+G+YIAMG HFIiD +V+PPS 
Sbjct: 445 ITPRFLPKSLTPGPSISSINSDYRRRSLLPYNVVWKSFIIGTYIAMGFYHFIJ3Q 504 

Query: 718 KYPDLWVLANCS LGFS CFVT FWI WNNYXL FEMRNS TLQDL 837 

KYPDLWVL NC++GF CF FW+W+ Y +F. + +++DL 
Sbjct: 505 KYPDLWVIiLNCAVGFI CFS I FWLWS YYKI FTSGSKSMKDL 544 



S. pombe 

_Score a 187 bits (475) , Expect - 2e-46 
Identities - 106/280 (37%), Positives * 150/280 (53%), Gaps = 1/2B0 (0%) 
Frame = +1 

Query: 1 I SVSTALAFI GS FGPI YI FGGYKNLV-QSMHRI FPFARGI FEDKVANFWCVSNI FI KYRN 177 

+SV+ F P +1+ YK Ii+ Q +HR+FPFARG+ +EDKVANFWC N K R 

Sbjct: 251 IiSVTVVFTFSLILFP-WIYMDYKTLL^^ 309 

Query: 178 LFTQKDLQLYSLLATVI GLLPSFI I TFLYPKRHLLP YALAACSMSFFLFSFQVHEKXXXX 357 

+FT LQ+ SL+ T+I +LPS +1 FLYP++ LL A+ S FFLFSFQVHEK 
Sbjct: 310 VTTLHQI^VISLIFTLISILPSCVTLFLYPRKRI^^ 3 _ 9 
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Query- 358 XXXXXXXXTTSRDWNV^^ 537 

+ + NN+A+ F+ LWPLLKKD L LQY + + NW 

Sbjct: 370 PliPTSIliCHGMTTKPWIALAin^ 422 

Ouerv- 538 VTPRFLPKFLTPGPSISDIDVDYRRASLLPKSLIW^ 717 

J " I D+ V K++++R I + Y+ M +1 +D ++ PPS 

Sbjct: 423 - IGDMW FSKNVLFRFIQLSFYVGMIVILGIDLFIPPPS 460 

Query: 718 KYPDLWVLANCSKSFSCFVTFWIVmNYXLFEMRNSTLQDL 837 

+YPDLWV+ N +L F+ F T ++W L + + DL 
Sbjct: 461 RYPDLWVILNVTIiSFAGFFTIYIiWTLGRLLHISSKLSTDL 500 

A. thai i ana 
Score » 117 bits (292), Expect = 2e-25 

Identities = 81/240 (33%), Positives = 120/240 (50%), Gaps = 2/240 (0%) 
Frame = +1 

Query: 85 MHRI FPFARGI FEDKVANFWCVSNT FI KYRNLFTQKDLQLYS LLATVI GLLPS FI ITFLY 264 

+ R+ PF RGI+ED VANFWC ++I IK++NLFT + L+ SL AT++ LPS + L 
Sbjct: 296 LSRLAPFERGI YEDYVANFWCTTS ILI KWTCNLFTTQSLKSISLAATILASLiPSMVQQI LS 355 

Query: 265 PlttHMiPYAIiAACSMSFFLFSFQvHEKXX^^ 444 

p Y L SM+F+LFSFQVHEK + L + ALF 
Sbjct: 356 PSNEGFLYGLIiNS SMAFYLFS FQVHEKS ILMPFIiSATLIiALKLPDHFSHLTYY ALF 411 

Query: 445 TLWPLLKKDNLVLQYGVMFMFSNWLI GNFS FVTPRFLPKFLTPG- - PS I SDIUVDYRRAS 618 

+++PLL +D L++ Y * + SF+ F + +PG +1 DV + 

Sbjct: 412 SMFPLLCRDKLLI PYLTL t — SFL- --FTVIYHSPGNHHAIQKTDVSFFSFK 457 

Query: 619 I^PKSLIWRLIIVGSYIAMGIIHFLDYWSPPSKYPDLWVIANCSI^ 798 

p + L+ +I++ ++H L + PP KYP Ii+ L FS F+ F + NY 

Sbjct: 458 NFPGYW--IiIiRTHFFISV-VL£TVLY^ 514 

H. sapiens 
Score ■ 103 bits (258) , Expect - 2e-21 

Identities = 78/266 (29%), Positives = 123/266 (46%), Gaps = 3/266 (1%) 

VSTAIAFIGSFGPIYI - - FGGYKNLVQSMHRI FPFARGI FEDKVANFWCVSNI FIKYRNL 18 0 
V A + SF ++ F + +Q + R+FP RG+FEDKVAN WC N+F+K +++ 
VKLACI WAS F\TLCWLPFFTEREQTLQVIjRRLFPVDRGLFEDKVANIWCS FNVFLKI KDI 291 

FTQKDLQL YSLLATVIGLLPSFI ITFLYPKRHLLPYALAACSMSFFLFS FQVHEKXXXXX 360 

+ + S T + LLP+ I LP + L +C++SFFLFSFQVHEK 

LPRHIQLIMSFCFTFLSLLPACIKLILQPSSKGFKFTLVSCALSFFLFSFQVHEKSILLV 351 

XXXXXXXYTSRDWl^iSLVCWINNVALFTLWPIiLKm 537 

+ + + w V+ F++ PLL KD L++ V M F + +FS 

SLPVCLVLS EI PFMSTWFI^VSTFSMLPIiliLKDELIJ^SVNrrTMAFFIACVTS FS I 407 

VTPRFLPKFLTPGPSISDIDVDYRRASLLPKSLIWRLI IVGSYIAMGI IHFLDYYVSPPS 717 

+ SIS V SI+++SIM+++ +PP 

FEKTSEEELQLKSFSIS VRKYLPCFTFLSRIIQyLFLISVITMVLLTLMT\rTLDPPQ 464 

KYPDLWVLANCSLGFSCFVTFWIWNN 795 
K PDL+ + C + F+ F ++ N 
KLPDLFSVLVCFVS CLNFLFFLVYFN 490 



Frame 


= +1 


Query: 


7 


Sbjct: 


232 


Query: 


181 


Sbjct: 


292 


Query: 


361 


Sbjct: 


352 


Query: 


538 


Sbjct: 


408 


Query: 


718 


Sb j ct : 


465 
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FIGURE 31 



ot-factor 
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TC^TTCAAACTGAAAAC^AAA 

GGGGCCGATCCTA^CCAATTAATTTATTTATTTGGGAGGATGGGGGCGGGCTCGGG 

AGGGAGGAGAGGGGTTGAACAGTTTCCTTTTGTTCCT(^CTGTTAATTCGCCCACCT 

TCGGGCCCTTCTTGTTCTGCAGCGCCAAGCAGGGTGCAGAGGGGCTGTGGCTTGCTT 

GAGGGGCCACTGTGGGGCTTCACTCCTGGTCACAGGTGGCAGC^ 

TCTATAAGCAGGGK-JGATGTAGCTCAGTT^ 

TCCTGGGTTCGATCCCCAGCACCA^ 

CCAAGCATTCTCCTTGGCTACATJ^ 

TACAAGAGACCCTATCTCAGAAAAT^ 

AAACACAGCCAGTCACTGTCACTGCATTC 

GGCAGATAACAGCTAAAAGGCACATAACCTTGGTGGGGAAATAAATGCCTGTGGTGT 
CCTGAGGGCCCCACCAAGTTCCAAAAAAAAAAAA 



>gi 1 18997007|gb|AAL83249.1 |AF474154_1 N- 
acetylglucosaminyltransferase V [Mus musculus] 

MAFFSPWIOLiSSQKIjGFFLVTFGFIWGMMLLHFTIQQRTQPESSSM^ 

I KAIAEENRD VVDGP YAGVMTAYDLKXTIAVLLDNI LQRI GKLE S KVDNLVNGTGAN 

STNSTTAVPSLVSLEKZNVADIINGVQEKC^ 

YAD YGVDGTS CS FF I YLS EVENWCPRLPWRAKNP YEEADHNSLAE I RTD FNI LYGMM 
K3GD2EFRWMRLRIRRMADAWIQA 

ETAFSGGPLGELVQWSDL I TSLYLLGHDIRI SASLAELKEIMKKWGNRSGCPTVGD 

RIVELIYIDIVGLAQFKKirLGPSWVHYQCMLRVLDSFGTEPEFNHASY^ 

WGKWNLNPQQFYTMFPHT PDNS FLGFWEQHLNS SD I HHINE I KRQNQSLVYGKVDS 

FWKNKKI YLDI I HT YME VHATVYG S STKNT PS YVKNHGILSGRDLQFLLRETKLFVG 

LGFPYEGPAPLEAIANGCAFLNPKFNPPKSSKNTD 

GRPHVWTVDLNNREEVEDAVKAILNQO 

QVMWPPLSALQVKLAEPGQSCKQVCQE^^^ 

LYKD I L VP S FYPKS KHCVFQGDLLLFS CAGAHPTHQR I CPCRDF I KGQVALCKDCL 
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